benchmark local llms.no |

a standardized suite of math, logic, coding, and reasoning questions. runs entirely in your browser via webgpu.

run benchmark

how it works

01select

02run

03report

select a model

9 models from 0.6b to 8b parameters, all running locally in your browser via webgpu. no api keys. no server.

qwen 3 0.6b0.6B

qwen~400mb

llama 3.2 1b1B

llama~720mb

qwen 2.5 1.5b1.5B

qwen~950mb

qwen 3 1.7b1.7B

qwen~1.1gb

gemma 2 2b2B

gemma~1.3gb

llama 3.2 3b3B

llama~1.8gb

qwen 3 4b4B

qwen~2.5gb

qwen 2.5 7b7B

qwen~4.2gb

qwen 3 8b8B

qwen~4.9gb

downloads once, cached locally · runs entirely on your gpu

run the suite

20 questions across math, logic, coding, and reasoning. each question tests multi-step reasoning — the key discriminator between model sizes.

question 1 / 200 correct

mathmediummath-012

What is the sum of the first 50 positive integers? Is it divisible by 3?

model response

▋

get your report

accuracy by difficulty is the clearest signal. small models collapse on hard questions; larger ones hold up. subject breakdown shows where capability drops off.

difficulty curve

82%easy14/17

54%medium9/17

21%hard3/16

by subject

math

60%

logic

75%

coding

45%

reasoning

70%

benchmark local llms.no api key.|

how it works

select a model

run the suite

get your report

benchmark local llms.no |