top | item 46902956

(no title)

jkelleyrtp | 24 days ago

claude swe-bench is 80.8 and codex is 56.8

Seems like 4.6 is still all-around better?

discuss

order

gizmodo59|24 days ago

Its SWE bench pro not swe bench verified. The verified benchmark has stagnated

joshuahedlund|24 days ago

Any ideas why verified has stagnated? It was increasing rapidly and then basically stopped.

Rudybega|24 days ago

You're comparing two different benchmarks. Pro vs Verified.