(no title)
andrew_eu | 1 year ago
The super verbose chain-of-reasoning that o1 does seems very well suited to logic puzzles as well, so I expected it to do reasonably well. As with many other LLM topics, though, the framing of the evaluation (or the templating of the prompt) can impact the results enormously.
No comments yet.