(no title)
codexon | 24 days ago
And specialised models for programming HAVE plateaued.
https://livebench.ai/#/?sort=Agentic+Coding+Average
From Claude 4.1 to 4.5 was only an 18% gain, and from 4.5 to 4.6 it even DECLINED. Codex 5.1 to 5.2 also shows a decline.
No comments yet.