top | item 47186187

Why SWE-bench Verified no longer measures frontier coding capabilities

2 points| gmays | 3 days ago |openai.com

discuss

order