top | item 46978620 (no title) goldenarm | 19 days ago If you're tired of cross-referencing the cherry-picked benchmarks, here's the geometric mean of SWE-bench Verified & HLE-tools :Claude Opus 4.6: 65.5%GLM-5: 62.6%GPT-5.2: 60.3%Gemini 3 Pro: 59.1% discuss order hn newest No comments yet.
No comments yet.