A real LLM benchmark, running entirely in your browser.

We download a quantized LLM to your browser. Your GPU runs it via WebGPU. We score it on multiple-choice accuracy and show you exactly what it wrote — no

run benchmark browse results demo run →

9 quantized models

0.6B to 8B parameters. Downloaded once, cached locally, and run entirely on your GPU via WebGPU — no API calls leave your device.

qwen3 0.6b0.6B · ~400mb

llama 3.2 1b1B · ~720mb

smollm2 1.7b1.7B · ~1.1gb

gemma 2 2b2B · ~1.3gb

llama 3.2 3b3B · ~1.8gb

phi 3.5 mini3.8B · ~2.2gb

qwen3 4b4B · ~2.5gb

mistral 7b7B · ~4.3gb

deepseek r1 8b8B · ~5.0gb

5-shot MMLU evaluation

The same 5-shot format used to benchmark frontier models — five solved examples precede every question, shown alongside published baselines for context.

Question 1 of 3cshard5-shottemp=0.1

Which type of attack involves an attacker inserting themselves between two communicating parties?

A.SQL injectionB.Man-in-the-middleC.Cross-site scriptingD.Buffer overflow

reasoning

▋

Honest report

Accuracy with a 95% Wilson confidence interval, and the full raw output of every question. No polish on top of weak models — you see what they actually wrote, including the spirals and the hedges.

0%accuracy

raw model output captured for every question · five-tier answer extraction

published reference: 58.2%·your run agrees within CI ✓

accuracy67%±28% · n=3 · 95% Wilson CI

correct2 / 3questions answered correctly

extracted3 / 3answers parsed from raw output

subjectcs · mathtopics sampled this run

5-shot MMLU·Wilson 95% CI·layered answer extraction·WebGPU inference