top | item 46902956 (no title) jkelleyrtp | 24 days ago claude swe-bench is 80.8 and codex is 56.8Seems like 4.6 is still all-around better? discuss order hn newest gizmodo59|24 days ago Its SWE bench pro not swe bench verified. The verified benchmark has stagnated joshuahedlund|24 days ago Any ideas why verified has stagnated? It was increasing rapidly and then basically stopped. load replies (1) Rudybega|24 days ago You're comparing two different benchmarks. Pro vs Verified.
gizmodo59|24 days ago Its SWE bench pro not swe bench verified. The verified benchmark has stagnated joshuahedlund|24 days ago Any ideas why verified has stagnated? It was increasing rapidly and then basically stopped. load replies (1)
joshuahedlund|24 days ago Any ideas why verified has stagnated? It was increasing rapidly and then basically stopped. load replies (1)
gizmodo59|24 days ago
joshuahedlund|24 days ago
Rudybega|24 days ago