top | item 44193436

(no title)

jcuenod | 9 months ago

82.2 on Aider

Still actually falling behind the official scores for o3 high. https://aider.chat/docs/leaderboards/

discuss

sottol|9 months ago

Does 82.2 correspond to the "Percent correct" of the other models?

Not sure if OpenAI has updated O3, but it looks like "pure" o3 (high) has a score of 79.6% in the linked table, "o3 (high) + gpt-4.1" combo has a the highest score of 82.7%.

The previous Gemini 2.5 Pro Preview 05-06 (yea, not current 06-05!) was at 76.9%.

That looks like a pretty nice bump!

But either way, these Aider benchmarks seem to be most useful/trustworthy benchmarks currently and really the only ones I'm paying attention to.

vessenes|9 months ago

But so.much.cheaper.and.faster. Pretty amazing.

hobofan|9 months ago

That's the older 05-06 preview, not the new one from today.

energy123|9 months ago

They knew that. The 82.2 comes from the new benchmarks in the OP not from the aider url. The aider url was supplied for comparison.