(no title)
dbagr
|
7 months ago
Either they overtook other LLMs by simply using more compute (which is reasonable to think as they have a lot of GPUs) or I'm willing to bet there is benchmark contamination. I don't think their engineering team came up with any better techniques than used in training other LLMs, and Elon has a history of making deceptive announcements.
z7|7 months ago
https://x.com/arcprize/status/1943168950763950555
saberience|7 months ago
What I've noticed when testing previous versions of Grok, on paper they were better at benchmarks, but when I used it the responses were always worse than Sonnet and Gemini even though Grok had higher benchmark scores.
Occasionally I test Grok to see if it could become my daily driver but it's never produced better answers than Claude or Gemini for me, regardless of what their marketing shows.
dbagr|7 months ago
unknown|7 months ago
[deleted]
ericlewis|7 months ago
vessenes|7 months ago