top | item 36304452

(no title)

> LLMs are not particularly good at arithmetic, counting syllables, or recognizing haikus, though, because (contrary to the thesis of the article) they don’t magically acquire whatever ability would “simplify” predicting the next token.

LLMs understand it to a certain extent. It's more then "predicting" the next token. When people ascribe "predicting the next token" it's a niave and unintelligent description to cover up what they don't understand.

I mean you can describe a human brain as simply wetware, a jumble of signals and chemical reactions that twitch muscles and react to pressure waves in the air and light. But obviously there is a higher level description of the human brain that is missing from that description.

The same thing could be said about LLMs. I can tell you this, researchers completely understand token prediction that much can be said. What we don't currently understand is the high level description. Perhaps it's not something we can understand as we've never been able to understand human consciousness at a high level either.

That's the thing with people. Nobody actually understands the high level description of a fully trained LLM. People are lambasting others because they "think" they understand when they only actually understand the low level primitives. We understand assembly, but you don't understand the Operating system written in assembly.

Take this for example:

     Me: 4320598340958340958340953095809348509348503480958340958304985038530495830 + 1
     chatGPT: 4320598340958340958340953095809348509348503480958340958304985038530495830 + 1 equals 4320598340958340958340953095809348509348503480958340958304985038530495831.

The chances of chatGPT memorizing or even predicting the next tokens here are in a probability too low to even consider. There are so many possible numbers here even numbers that aren't true but have a "higher probability" of being close to the truth from a token/edit-distance standpoint. It's safe to say, from a scientific standpoint, chatGPT in this scenario understands what it means to add 1.

Realize that this calculation results in an overflow. chatGPT needs symbolic understanding to perform the feat it did above.

But there are, of course, things it gets wrong. But again we don't truly understand what's going on here. Is it lying to us? Perhaps it can't differentiate between just a generated statistical token or a actual math equation. It's hard to say. But from the example above, by probability, we know that an aspect of true understanding and ability exists.

discuss

BobbyJo|2 years ago

I do think that LLMs have emergent properties that do some interesting things, however I would like to point out simple next token prediction would work on your example quite well.

<numbers>0 + 1 -> <numbers>1

Even simple attention mechanisms would handle that quite well with enough examples of <numbers>

doctor_eval|2 years ago

I agree with you, but it also works for

    4320598340958340958340953095809348509348503480958340958304985038530999999 + 1 ?

    The sum of 4320598340958340958340953095809348509348503480958340958304985038530999999 and 1 is 4320598340958340958340953095809348509348503480958340958304985038531000000.

which is more complex.

I'm too lazy to get it to add two large numbers together.

Also, I've never been convinced that "ability to do arithmetic" has any relationship to intelligence. We don't expect regular humans to be able to add two large numbers together reliably, either.

oneearedrabbit|2 years ago

This number is tokenized as a list: 43 20 59 83 409 58 340 9 58 340 95 30 95 809 34 850 9 34 850 34 809 58 340 9 58 30 49 850 385 30 49 58 30. If GPT recognizes the context "+ 1 equals" through the attention mechanism, it can predict that the next number in the sequence should be 31: ... 58 30 -> ... 58 31

hyperliner|2 years ago

[deleted]