top | item 46683628

(no title)

jcuenod | 1 month ago

Comparison to GPT-OSS-20B (irrespective of how you feel that model actually performs) doesn't fill me with confidence. Given GLM 4.7 seems like it could be competitive with Sonnet 4/4.5, I would have hoped that their flash model would run circles around GPT-OSS-120B. I do wish they would provide an Aider result for comparison. Aider may be saturated among SotA models, but it's not at this size.

discuss

syntaxing|1 month ago

Hoping a 30-A3B runs circles around a 117-A5.1B is a bit hopeful thinking, especially when you’re testing embedded knowledge. From the numbers, I think this model excels at agent calls compared to GPT-20B. The rest are about the same in terms of performance

victorbjorklund|1 month ago

The benchmarks lie. I've been using using glm 4.7 and it's pretty okay with simple tasks but it's nowhere even near Sonnet. Still useful and good value but it's not even close.

unsupp0rted|1 month ago

> Given GLM 4.7 seems like it could be competitive with Sonnet 4/4.5

Not for code. The quality is so low, it's roughly on par with Sonnet 3.5