(no title)
Ireallyapart | 2 years ago
LLMs understand it to a certain extent. It's more then "predicting" the next token. When people ascribe "predicting the next token" it's a niave and unintelligent description to cover up what they don't understand.
I mean you can describe a human brain as simply wetware, a jumble of signals and chemical reactions that twitch muscles and react to pressure waves in the air and light. But obviously there is a higher level description of the human brain that is missing from that description.
The same thing could be said about LLMs. I can tell you this, researchers completely understand token prediction that much can be said. What we don't currently understand is the high level description. Perhaps it's not something we can understand as we've never been able to understand human consciousness at a high level either.
That's the thing with people. Nobody actually understands the high level description of a fully trained LLM. People are lambasting others because they "think" they understand when they only actually understand the low level primitives. We understand assembly, but you don't understand the Operating system written in assembly.
Take this for example:
Me: 4320598340958340958340953095809348509348503480958340958304985038530495830 + 1
chatGPT: 4320598340958340958340953095809348509348503480958340958304985038530495830 + 1 equals 4320598340958340958340953095809348509348503480958340958304985038530495831.
The chances of chatGPT memorizing or even predicting the next tokens here are in a probability too low to even consider. There are so many possible numbers here even numbers that aren't true but have a "higher probability" of being close to the truth from a token/edit-distance standpoint. It's safe to say, from a scientific standpoint, chatGPT in this scenario understands what it means to add 1.Realize that this calculation results in an overflow. chatGPT needs symbolic understanding to perform the feat it did above.
But there are, of course, things it gets wrong. But again we don't truly understand what's going on here. Is it lying to us? Perhaps it can't differentiate between just a generated statistical token or a actual math equation. It's hard to say. But from the example above, by probability, we know that an aspect of true understanding and ability exists.
BobbyJo|2 years ago
<numbers>0 + 1 -> <numbers>1
Even simple attention mechanisms would handle that quite well with enough examples of <numbers>
doctor_eval|2 years ago
I'm too lazy to get it to add two large numbers together.
Also, I've never been convinced that "ability to do arithmetic" has any relationship to intelligence. We don't expect regular humans to be able to add two large numbers together reliably, either.
oneearedrabbit|2 years ago
hyperliner|2 years ago
[deleted]