A real LLM benchmark, running entirely in your browser.

We download a quantized LLM to your browser. Your GPU runs it via WebGPU. We score it on multiple-choice accuracy and show you exactly what it wrote — no

9 quantized models

0.6B to 8B parameters. Downloaded once, cached locally, and run entirely on your GPU via WebGPU — no API calls leave your device.

qwen3 0.6b0.6B · ~400mb
llama 3.2 1b1B · ~720mb
smollm2 1.7b1.7B · ~1.1gb
gemma 2 2b2B · ~1.3gb
llama 3.2 3b3B · ~1.8gb
phi 3.5 mini3.8B · ~2.2gb
qwen3 4b4B · ~2.5gb
mistral 7b7B · ~4.3gb
deepseek r1 8b8B · ~5.0gb

5-shot MMLU evaluation

The same 5-shot format used to benchmark frontier models — five solved examples precede every question, shown alongside published baselines for context.

Question 1 of 3cshard5-shottemp=0.1

Which type of attack involves an attacker inserting themselves between two communicating parties?

A.SQL injectionB.Man-in-the-middleC.Cross-site scriptingD.Buffer overflow
reasoning

Honest report

Accuracy with a 95% Wilson confidence interval, and the full raw output of every question. No polish on top of weak models — you see what they actually wrote, including the spirals and the hedges.

0%accuracy
raw model output captured for every question · five-tier answer extraction
published reference: 58.2%·your run agrees within CI ✓
accuracy67%±28% · n=3 · 95% Wilson CI
correct2 / 3questions answered correctly
extracted3 / 3answers parsed from raw output
subjectcs · mathtopics sampled this run
5-shot MMLU·Wilson 95% CI·layered answer extraction·WebGPU inference