top | item 46902501 (no title) usaar333 | 24 days ago i'd interpret that as rounding error. that is unchangedswe-bench seems really hard once you are above 80% discuss order hn newest Squarex|24 days ago it's not a great benchmark anymore... starting with it being python / django primarily... the industry should move to something more representative usaar333|24 days ago Openai has; they don't even mention score on gpt-5.3-codex.On the other hand, it is their own verified benchmark, which is telling.
Squarex|24 days ago it's not a great benchmark anymore... starting with it being python / django primarily... the industry should move to something more representative usaar333|24 days ago Openai has; they don't even mention score on gpt-5.3-codex.On the other hand, it is their own verified benchmark, which is telling.
usaar333|24 days ago Openai has; they don't even mention score on gpt-5.3-codex.On the other hand, it is their own verified benchmark, which is telling.
Squarex|24 days ago
usaar333|24 days ago
On the other hand, it is their own verified benchmark, which is telling.