Buried, but on Page 24 they reveal to me the most surprising massive capability leap - that o3-mini is way better at conning gpt-4o for money (79% win rate for o3-mini vs 27% for full o1!). It isn't surprising to me that "reasoning" can lead to improvements in modeling another LLM, but definitely makes me wary for future persuasive abilities on humans as well.
No comments yet.