top | item 47072325

(no title)

maxxmini | 12 days ago

Really cool to see a 3B model running fully client-side via WebGPU. The claim about beating Qwen3-32B on Arena-Hard at 10x smaller is interesting - do you know if those benchmarks hold up for more practical tasks like summarization or instruction following? Also curious about inference speed - what kind of tokens/sec are you seeing on a typical laptop GPU?

discuss

order

No comments yet.