run benchmark

select a model and question count, then run

questions