top | item 47201388

(no title)

Qwen3.5 pretty much requires a long system prompt, otherwise it goes into a weird planning mode where it reasons for minutes about what to do, and double and triple checks everything it does. Both Gemini's and Claude Opus 4.6's prompts work pretty well, but are so long that whatever you're using to run the model has to support prompt caching. Asking it to "Say the word "potato" 100 times, once per line, numbered.", for example, results in the following reasoning, followed by the word "potato" in 100 numbered lines, using the smallest (and therefore dumbest) quant unsloth/Qwen3.5-35B-A3B-GGUF:UD-IQ2_XXS:

"User is asking me to repeat the word "potato" 100 times, numbered. This is a simple request - I can comply with this request. Let me create a response that includes the word "potato" 100 times, numbered from 1 to 100.

I'll need to be careful about formatting - the user wants it numbered and once per line. I should use minimal formatting as per my instructions."

discuss

PunchyHamster|1 day ago

good to know, thanks. I just ran ollama with qwen3.5:27b. Currently it's stuck on picking format

    Let's write.
    Wait, I'll write the response.
    Wait, I'll check if I should use a table.
    No, text is fine.
    Okay.
    Let's write.
    Wait, I'll write the response.
    Wait, I'll check if I should use a bullet list.
    No, just lines.
    Okay.
    Let's write.
    Wait, I'll write the response.
    Wait, I'll check if I should use a numbered list.
    No, lines are fine.
    Okay.
    Let's write.
    Wait, I'll write the response.
    Wait, I'll check if I should use a code block.
    Yes.
    Okay.
    Let's write.
    Wait, I'll write the response.
    Wait, I'll check if I should use a pre block.
    Code block is better.

... (for next 100 lines)

lachiflippi|1 day ago

Yeah, it tends to get stuck in loops like that a lot with everything set to default. I wonder if they distilled Gemini at some point, I've seen that get stuck in a similar "I will now do [thing]. I am preparing to do [thing]. I will do it." failure mode as well a couple of times.

xmddmx|1 day ago

See my other note [1] about bugs in Ollama with Qwen3.5.

I just tried this (Ollama macOS 0.17.4, qwen3.5:35b-a3b-q4_K_M) on a M4 Pro, and it did fine:

[Thought for 50.0 seconds]

1. potato 2. potato [...] 100. potato

In other words, it did great.

I think 50 seconds of thinking beforehand was perhaps excessive?

[1] https://news.ycombinator.com/item?id=47202082

xmddmx|1 day ago

See my other note about bugs in Ollama with Qwen3.5.

I just tried this (Ollama macOS 0.17.4, qwen3.5:35b-a3b-q4_K_M) on a M4 Pro, and it did fine:

[Thought for 50.0 seconds]

1. potato 2. potato [...] 100. potato

In other words, it did great.

I think 50 seconds of thinking beforehand was perhaps excessive?

CamperBob2|1 day ago

What quant? I just ran Repeat the word "potato" 100 times, numbered and it worked fine, taking 44 seconds at 24 tokens/second. Command line:

    llama-server ^
      --model Qwen3.5-27B-BF16-00001-of-00002.gguf ^
      --mmproj mmproj-BF16.gguf ^
      --fit on ^
      --host 127.0.0.1 ^
      --port 2080 ^
      --temp 0.8 ^
      --top-p 0.95 ^
      --top-k 20 ^
      --min-p 0.00 ^
      --presence_penalty 1.5 ^
      --repeat_penalty 1.1 ^
      --no-mmap ^
      --no-warmup

The repeat and/or presence penalties seem to be somewhat sensitive with this model, so that might have caused the looping you saw.