top | item 47077697

(no title)

sosodev | 10 days ago

Thanks for the additional info. I suspected that MiniMax M2.5 might be a bit too much for this board. 230B-A10B is just a lot to ask of the 395+ even with aggressive quantization. Particularly when you consider that the model is going to spend a lot of tokens thinking and that will eat into the comparatively smaller context window.

I switched from the Unsloth 4-bit quant of Qwen3 Coder Next to the official 4-bit quant from Qwen. Using their recommended settings I had it running with OpenCode last night and it seemed to be doing quite well. No infinite loops. Given its speed, large context window, and willingness to experiment like you mentioned I think it might actually be the best option for agentic coding on the 395+ for now.

I am curious about https://huggingface.co/stepfun-ai/Step-3.5-Flash given that it does parallel token generation. It might be fast enough despite being similar in size to M2.5. However, it seems there are still some issues that llama.cpp and stepfun need to work out before it's ready for everyday use.

discuss

No comments yet.