benchmark local llms.no |

curated stem questions from mmlu — cs, math, science, engineering. runs entirely in your browser via webgpu. compete on a global leaderboard.

select a model

9 models from 0.6b to 8b parameters — qwen, llama, gemma, phi, deepseek r1, mistral, smollm2. all running locally via webgpu. no api keys. no server.

qwen3 0.6b0.6B
qwen~400mb
llama 3.2 1b1B
llama~720mb
smollm2 1.7b1.7B
smollm2~1.1gb
gemma 2 2b2B
gemma~1.3gb
llama 3.2 3b3B
llama~1.8gb
phi 3.5 mini3.8B
phi~2.2gb
qwen3 4b4B
qwen~2.5gb
mistral 7b7B
mistral~4.3gb
deepseek r1 8b8B
deepseek~5.0gb

downloads once, cached locally · runs entirely on your gpu

run the suite

multiple-choice questions from mmlu — cs, math, science, engineering. hard enough that even 8b models score under 60%. no ceiling.

question 1 / 200 correct
csmediumcomputer-science-14

Which of the following sorting algorithms has the best average-case time complexity?

A.Bubble sortB.Insertion sortC.Merge sortD.Selection sort
model response

get your report

see exactly where your model fails — broken down by subject and difficulty. click any question to review the full reasoning.

438/ 1000
easy
7/8+0pts
medium
6/8+0pts
hard
4/4+0pts
questions
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20