top | item 44829716

(no title)

attentive | 6 months ago

> scoring 74.9% on SWE-bench Verified and 88% on Aider polyglot

why isn't it on https://aider.chat/docs/leaderboards/?

"last updated August 07, 2025"

discuss

order

tedsanders|6 months ago

The 88% is our self-reported score on our internal implementation of Aider polyglot.

The leaderboard score would come from Aider independently running GPT-5 themselves. The score should be about the same.

(I work at OpenAI.)