In my experience with the models (watching Claude play Pokemon), the models are similar in intelligence, but are very different in how they approach problems: Opus 4.5 hyperfocuses on completing its original plan, far more than any older or newer version of Claude. Opus 4.6 gets bored quickly and is constantly changing its approach if it doesn't get results fast. This makes it waste more time on"easy" tasks where the first approach would have worked, but faster by an order of magnitude on "hard" tasks that require trying different approaches. For this reason, it started off slower than 4.5, but ultimately got as far in 9 days as 4.5 got in 59 days.
bjt12345|13 days ago
Opus 4.6 can be quite sassy at times, the other day I asked it if it were "buttering me up" and it candidly responded "Hey you asked me to help you write a report with that conclusion, not appraise it."
KronisLV|13 days ago
alkhatib|13 days ago
I started using it last week and it’s been great. Uses git worktrees, experimental feature (spotlight) allows you to quickly check changes from different agents.
I hope the Claude app will add similar features soon
bredren|13 days ago
DaKevK|13 days ago
MrCheeze|13 days ago
https://docs.google.com/spreadsheets/u/0/d/e/2PACX-1vQDvsy5D...
Jach|13 days ago
MrCheeze|13 days ago
Claude Plays Pokemon is currently stuck in Victory Road, doing the Sokoban puzzles which are both the last puzzles in the game and by far the most difficult for AIs to do. Opus 4.5 made it there but was completely hopeless, 4.6 made it there and is is showing some signs of maaaaaybe being eventually bruteforce through the puzzles, but personally I think it will get stuck or undo its progress, and that Claude 4.7 or 5 will be the one to actually beat the game.