A real LLM benchmark, running entirely in your browser.
We download a quantized LLM to your browser. Your GPU runs it via WebGPU. We score it on multiple-choice accuracy and show you exactly what it wrote — no
9 quantized models
0.6B to 8B parameters. Downloaded once, cached locally, and run entirely on your GPU via WebGPU — no API calls leave your device.

qwen3 0.6b0.6B · ~400mb

llama 3.2 1b1B · ~720mb

smollm2 1.7b1.7B · ~1.1gb

gemma 2 2b2B · ~1.3gb

llama 3.2 3b3B · ~1.8gb

phi 3.5 mini3.8B · ~2.2gb

qwen3 4b4B · ~2.5gb

mistral 7b7B · ~4.3gb

deepseek r1 8b8B · ~5.0gb
5-shot MMLU evaluation
The same 5-shot format used to benchmark frontier models — five solved examples precede every question, shown alongside published baselines for context.
Question 1 of 3cshard5-shottemp=0.1
Which type of attack involves an attacker inserting themselves between two communicating parties?
A.SQL injectionB.Man-in-the-middleC.Cross-site scriptingD.Buffer overflow
reasoning
▋Honest report
Accuracy with a 95% Wilson confidence interval, and the full raw output of every question. No polish on top of weak models — you see what they actually wrote, including the spirals and the hedges.
0%accuracy
raw model output captured for every question · five-tier answer extraction
published reference: 58.2%·your run agrees within CI ✓
accuracy67%±28% · n=3 · 95% Wilson CI
correct2 / 3questions answered correctly
extracted3 / 3answers parsed from raw output
subjectcs · mathtopics sampled this run
5-shot MMLU·Wilson 95% CI·layered answer extraction·WebGPU inference