(no title)
dwohnitmok | 9 days ago
> then all we can conclude is that Gemini can play chess well, and we cannot generalize to other LLMs who play about the level of random bot
That's overly reductive. That would be true if we didn't see improvement over time from the other LLMs but we clearly do. In particular, even if Gemini is benchmarkmaxxing, this means that LLMs from other labs will eventually get there as well. Benchmarkmaxxing can be thought of as "premature" reaching of benchmarks. But I can't think of a single benchmark that was benchmarkmaxxed that wasn't eventually saturated by every single LLM provider (because being able to benchmarkmaxx serves as an existence proof that there is an LLM capable of it and as more training gets done on the LLMs the other ones get there).
runarberg|8 days ago
The claim here (way up thread) was: “we have the technology to train models to do anything that you can do on a computer, only thing that's missing is the data”, and the implication is that logic and reasoning is an emerging properties of these models if given enough data and enough parameters. However the evidence seems to suggest otherwise. Logic and reasoning have to be specifically programmed into these models, and even with dataset as vast as online chess games (just lichess has 7.1 billion games), if that claim above were true, chess should be easy for LLMs, but it obviously isn’t. And that tells us something about the limitations of the technology.