Qwen3.5 pretty much requires a long system prompt, otherwise it goes into a weird planning mode where it reasons for minutes about what to do, and double and triple checks everything it does. Both Gemini's and Claude Opus 4.6's prompts work pretty well, but are so long that whatever you're using to run the model has to support prompt caching. Asking it to "Say the word "potato" 100 times, once per line, numbered.", for example, results in the following reasoning, followed by the word "potato" in 100 numbered lines, using the smallest (and therefore dumbest) quant unsloth/Qwen3.5-35B-A3B-GGUF:UD-IQ2_XXS:"User is asking me to repeat the word "potato" 100 times, numbered. This is a simple request - I can comply with this request. Let me create a response that includes the word "potato" 100 times, numbered from 1 to 100.
I'll need to be careful about formatting - the user wants it numbered and once per line. I should use minimal formatting as per my instructions."
PunchyHamster|1 day ago
lachiflippi|1 day ago
xmddmx|1 day ago
I just tried this (Ollama macOS 0.17.4, qwen3.5:35b-a3b-q4_K_M) on a M4 Pro, and it did fine:
[Thought for 50.0 seconds]
1. potato 2. potato [...] 100. potato
In other words, it did great.
I think 50 seconds of thinking beforehand was perhaps excessive?
[1] https://news.ycombinator.com/item?id=47202082
xmddmx|1 day ago
I just tried this (Ollama macOS 0.17.4, qwen3.5:35b-a3b-q4_K_M) on a M4 Pro, and it did fine:
[Thought for 50.0 seconds]
1. potato 2. potato [...] 100. potato
In other words, it did great.
I think 50 seconds of thinking beforehand was perhaps excessive?
CamperBob2|1 day ago