top | item 43686136 (no title) archeantus | 10 months ago “GPT‑4.1 scores 54.6% on SWE-bench Verified, improving by 21.4%abs over GPT‑4o and 26.6%abs over GPT‑4.5—making it a leading model for coding.”4.1 is 26.6% better at coding than 4.5. Got it. Also…see the em dash discuss order hn newest pdabbadabba|10 months ago What's wrong with the em-dash? That's just...the typographically correct dash AFAIK. clbrmbr|10 months ago Maybe a reference to the OpenAI models loving to output em-dashes? drexlspivey|10 months ago Should have named it 4.10 clbrmbr|10 months ago But it’s so much weaker than 4.5 in broader tasks… maybe more optimized against benchmarks but it’s just no replacement for a huge model.
pdabbadabba|10 months ago What's wrong with the em-dash? That's just...the typographically correct dash AFAIK. clbrmbr|10 months ago Maybe a reference to the OpenAI models loving to output em-dashes?
drexlspivey|10 months ago Should have named it 4.10 clbrmbr|10 months ago But it’s so much weaker than 4.5 in broader tasks… maybe more optimized against benchmarks but it’s just no replacement for a huge model.
clbrmbr|10 months ago But it’s so much weaker than 4.5 in broader tasks… maybe more optimized against benchmarks but it’s just no replacement for a huge model.
pdabbadabba|10 months ago
clbrmbr|10 months ago
drexlspivey|10 months ago
clbrmbr|10 months ago