top | item 45837483

(no title)

Glamklo | 3 months ago

Is there anything available already on how to setup a reasoning model and let it 'work'/'think' for a few hours?

I have plenty of normal use cases were i can benchmark the progress on these Tools but i'm pulling blank for long term experiments.

discuss

irthomasthomas|3 months ago

You can run them using my project llm-consortium. Something like this:

  > uv tool install llm
  > llm install llm-consortium
  > llm consortium save cns-k2-n2 -m k2-thinking -n 2 --arbiter k2 --min-iterations 10
  > llm -m cns-k2-n2 "Find a polynomial time solution for the traveling salesman problem"

This will run two parallel prompting threads, so two conversations with k2-thinking for 10 iterations.

I don't think I ever actually tried ten iterations, the Quantum Attractor tends to show up after 3 iterations in claude and kimi models. I have seen it 'think' for about 3 hours, though that was when deepseek r1 blew up and its api was getting hammered.

Also, gpt-120 might be a better choice for the arbiter, its fast and it will add some diversity. Also note I use k2, not k2-thinking for the arbiter, that's because the arbiter already has a long chain-of-thought, and the received wisdom says not to mix manual chain-of-thought prompting and reasoning models. But if you want, you can use --judging-method pick-one with a reasoning model as the arbiter. Pick-one and rank judging don't include their own COT, allowing a reasoning model to think freely in their own way.