top | item 46978614

(no title)

mohas | 18 days ago

I kinda feel this bench-marking thing with Chinese models is like university Olympiads, they specifically study for those but when time comes for the real world work they seriously lack behind.

discuss

OsrsNeedsf2P|18 days ago

I kinda feel like the goalposts are shifting. While we're not there yet, in a world where Chinese models surpass Western ones, HN will be nitpicking edge cases long after the ship sails

Oras|18 days ago

I don’t think it’s undermining the effort and improvement, but usability of these models aren’t usually what their benchmarks suggest.

Last time there was a hype about GLM coding model, I tested it with some coding tasks and it wasn’t usable when comparing with Sonnet or GPT-5

I hope this one is different