benchmark local llms.no |
a standardized suite of math, logic, coding, and reasoning questions. runs entirely in your browser via webgpu.
how it works
01select
02run
03report
select a model
9 models from 0.6b to 8b parameters, all running locally in your browser via webgpu. no api keys. no server.
qwen 3 0.6b0.6B
qwen~400mb
llama 3.2 1b1B
llama~720mb
qwen 2.5 1.5b1.5B
qwen~950mb
qwen 3 1.7b1.7B
qwen~1.1gb
gemma 2 2b2B
gemma~1.3gb
llama 3.2 3b3B
llama~1.8gb
qwen 3 4b4B
qwen~2.5gb
qwen 2.5 7b7B
qwen~4.2gb
qwen 3 8b8B
qwen~4.9gb
downloads once, cached locally · runs entirely on your gpu
run the suite
20 questions across math, logic, coding, and reasoning. each question tests multi-step reasoning — the key discriminator between model sizes.
question 1 / 200 correct
mathmediummath-012
What is the sum of the first 50 positive integers? Is it divisible by 3?
model response
▋get your report
accuracy by difficulty is the clearest signal. small models collapse on hard questions; larger ones hold up. subject breakdown shows where capability drops off.
difficulty curve
82%easy14/17
54%medium9/17
21%hard3/16
by subject
math
60%
logic
75%
coding
45%
reasoning
70%