(no title)
silvr | 9 months ago
Claude got stuck reasoning its way through one of the more complex puzzle areas. Gemini took a while on it also, but made it through. I don't that difference can be fully attributed up to the harnesses.
Obviously, the best thing to do would be to run a SxS in the same harness of the two models. Maybe that will happen?
throwaway314155|9 months ago
Basically, the gane being conpleted by gemini was in an inferior category (however minuscule) of experiment.
I get it though. People demanded these types of changes in the CPP twitch chat, because the pain of watching the model fail in slow motion is simply too much.