benchmark local llms.no |

a standardized suite of math, logic, coding, and reasoning questions. runs entirely in your browser via webgpu.

how it works

select a model

9 models from 0.6b to 8b parameters, all running locally in your browser via webgpu. no api keys. no server.

qwen 3 0.6b0.6B
qwen~400mb
llama 3.2 1b1B
llama~720mb
qwen 2.5 1.5b1.5B
qwen~950mb
qwen 3 1.7b1.7B
qwen~1.1gb
gemma 2 2b2B
gemma~1.3gb
llama 3.2 3b3B
llama~1.8gb
qwen 3 4b4B
qwen~2.5gb
qwen 2.5 7b7B
qwen~4.2gb
qwen 3 8b8B
qwen~4.9gb

downloads once, cached locally · runs entirely on your gpu

run the suite

20 questions across math, logic, coding, and reasoning. each question tests multi-step reasoning — the key discriminator between model sizes.

question 1 / 200 correct
mathmediummath-012

What is the sum of the first 50 positive integers? Is it divisible by 3?

model response

get your report

accuracy by difficulty is the clearest signal. small models collapse on hard questions; larger ones hold up. subject breakdown shows where capability drops off.

difficulty curve
82%easy14/17
54%medium9/17
21%hard3/16
by subject
math
60%
logic
75%
coding
45%
reasoning
70%