top | item 44434154

(no title)

zacksiri | 8 months ago

Based on my testing the larger the model the better it is at handling larger context.

I tested with 8B model, 14B model and 32B model.

I wanted it to create structured json, and the context was quite large like 60k tokens.

the 8B model failed miserably despite supporting 128k context, the 14b did better the 32B one almost got everything correct. However when jumping to a really large model like grok-3-mini it got it all perfect.

The 8B, 14B, 32B models I tried were Qwen 3. All the models I tested I disabled thinking.

Now for my agent workflows I use small models for most workflow (it works quite nicely) and only use larger models when the problem is harder.

discuss

v3ss0n|8 months ago

That is true too. But I found Qwen3 14B with 8bit quant fair better than 32B with 4b quant . Both kvcache at 8bit. ( i enabled thinking , i will try with /nothink)