(no title)
zacksiri | 8 months ago
I tested with 8B model, 14B model and 32B model.
I wanted it to create structured json, and the context was quite large like 60k tokens.
the 8B model failed miserably despite supporting 128k context, the 14b did better the 32B one almost got everything correct. However when jumping to a really large model like grok-3-mini it got it all perfect.
The 8B, 14B, 32B models I tried were Qwen 3. All the models I tested I disabled thinking.
Now for my agent workflows I use small models for most workflow (it works quite nicely) and only use larger models when the problem is harder.
v3ss0n|8 months ago