(no title)
paulhodge | 6 months ago
And I don't think we have a great eval benchmark that exactly measures this capability yet. SWE Bench seems to be pretty good, but there's already a lot of anecdotal comments that Claude is still better at coding than GPT 5, despite having similar scores on SWE Bench.
CuriouslyC|6 months ago
https://fiction.live/stories/Fiction-liveBench-Mar-25-2025/o... https://longbench2.github.io/
itsafarqueue|6 months ago
libraryofbabel|6 months ago
pcwelder|6 months ago
My guess for why GPT5 scores more on benchmarks is that they evaluate on well defined tasks with all instructions given at the start.
Real life is multi turn. Multiple set of prompts to adhere to. This is where Claude is likely better.