top | item 43683473

(no title)

elias_t | 10 months ago

Does someone have the benchmarks compared to other models?

discuss

cbg0|10 months ago

claude 3.7 no thinking (diff) - 60.4%

claude 3.7 32k thinking tokens (diff) - 64.9%

GPT-4.1 (diff) - 52.9% (stat is from the blog post)

https://aider.chat/docs/leaderboards/