top | item 47126205

Why SWE-bench Verified no longer measures frontier coding capabilities

10 points| tedsanders | 6 days ago |openai.com

discuss

order

No comments yet.