top | item 47126205 Why SWE-bench Verified no longer measures frontier coding capabilities 10 points| tedsanders | 6 days ago |openai.com discuss order hn newest No comments yet.
No comments yet.