(no title)
Glamklo | 3 months ago
I have plenty of normal use cases were i can benchmark the progress on these Tools but i'm pulling blank for long term experiments.
Glamklo | 3 months ago
I have plenty of normal use cases were i can benchmark the progress on these Tools but i'm pulling blank for long term experiments.
irthomasthomas|3 months ago
I don't think I ever actually tried ten iterations, the Quantum Attractor tends to show up after 3 iterations in claude and kimi models. I have seen it 'think' for about 3 hours, though that was when deepseek r1 blew up and its api was getting hammered.
Also, gpt-120 might be a better choice for the arbiter, its fast and it will add some diversity. Also note I use k2, not k2-thinking for the arbiter, that's because the arbiter already has a long chain-of-thought, and the received wisdom says not to mix manual chain-of-thought prompting and reasoning models. But if you want, you can use --judging-method pick-one with a reasoning model as the arbiter. Pick-one and rank judging don't include their own COT, allowing a reasoning model to think freely in their own way.