benchmark local llms.no |
curated stem questions from mmlu — cs, math, science, engineering. runs entirely in your browser via webgpu. compete on a global leaderboard.
select a model
9 models from 0.6b to 8b parameters — qwen, llama, gemma, phi, deepseek r1, mistral, smollm2. all running locally via webgpu. no api keys. no server.

qwen3 0.6b0.6B
qwen~400mb

llama 3.2 1b1B
llama~720mb

smollm2 1.7b1.7B
smollm2~1.1gb

gemma 2 2b2B
gemma~1.3gb

llama 3.2 3b3B
llama~1.8gb

phi 3.5 mini3.8B
phi~2.2gb

qwen3 4b4B
qwen~2.5gb

mistral 7b7B
mistral~4.3gb

deepseek r1 8b8B
deepseek~5.0gb
downloads once, cached locally · runs entirely on your gpu
run the suite
multiple-choice questions from mmlu — cs, math, science, engineering. hard enough that even 8b models score under 60%. no ceiling.
question 1 / 200 correct
csmediumcomputer-science-14
Which of the following sorting algorithms has the best average-case time complexity?
A.Bubble sortB.Insertion sortC.Merge sortD.Selection sort
model response
▋get your report
see exactly where your model fails — broken down by subject and difficulty. click any question to review the full reasoning.
438/ 1000
easy
7/8+0pts
medium
6/8+0pts
hard
4/4+0pts
questions
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20