(no title)
Arkhaine_kupo | 3 months ago
its unsurprising that a company heavily invested in LLMs would describe clustered information as a world model, but it isnt. Transformer models, for video or text LLMs dont have the kind of stuff you would need to have a world model. They can mimic some level of consistency as long as the context window holds, but that disappears the second the information leaves that space.
In terms of human cognition it would be like the difference between short term memory, long term memory and being able to see the stuff in front of you. A human can instinctively know the relative weight, direction and size of objects and if a ball rolls behind a chair you still know its there 3 days later. A transformer model cannot do any of those things and at best can remember the ball behind the chair until enough information comes in to push it out of the context window at which point it can not reapper.
> putting a word at the end of the second line of a poem based on the rhyme they need for the last)
that is the kind of work that exists inside its conext window. Feed it a 400 page book, which any human could easily read, digest, parse and understand and make it do a single read and ask questions about different chapters. You will quickly see it make shit up that fits the information given previously and not the original text.
> We really don't know how these things tick
I don't know enough about the universe either. But if you told me that there are particles smaller than plank length and others that went faster than the speed of light then I would tell you that it cannot happen due to the basic laws of the universe. (I know there are studies on FTL neutrinos and dark matter but in general terms, if you said you saw carbon going FTL I wouldnt believe you).
Similarly, Transformer models are cool, emergent properties are super interesting to study in larger data sets. Adding tools to the side for deterministic work helps a lot, agenctic multi modal use is fun. But a transformer does not and cannot have a world model as we understand it, Yann Lecunn left facebook because he wants to work on world model AIs rather than transformer models.
> If you put a person in charge of predicting which direction a fish will be facing in 5 minutes,
what that human will never do is think the fish is gone because he went inside the castle and he lost sight of it. Something a transformer would.
ToValueFunfetti|3 months ago
Long-term memory and object permanence don't seem necessary for thought. A 1-year-old can think, as can a late-stage Alzheimers patient. Neither could get through a 400-page book, but that's irrelevant.
Listing human capabilities that LLMs don't have doesn't help unless you demonstrate these are prerequisites for thought. Helen Keller couldn't tell you the weight, direction, or size of a rolling ball, but this is not relevant to the question of whether she could think.
Can you point to the speed-of-light analogy laws that constrain how LLMs work in a way that excludes the possibility of thought?
Arkhaine_kupo|3 months ago
a world model in AI has specific definition, which is an internal representation that the AI can use to understand and simulate its environment.
> Long-term memory and object permanence don't seem necessary for thought. A 1-year-old can think, as can a late-stage Alzheimers patient
Both those cases have long term memory and object permanence, they also have a developing memory or memory issues. But the issues are not constrained by their context window. Children develop object permance in the first 8 months, and similar to distinguishing between their own body and their mothers that is them developing a world model. Toddlers are not really thinking, they are responding to stimulus, they feel huger they cry. They hear a loud sound they cry. Its not really them coming up with a plan to get fed or attention
> Listing human capabilities that LLMs don't have doesn't help unless you demonstrate these are prerequisites for thought. Helen Keller couldn't tell you the weight, direction, or size of a rolling ball
Helen Keller had understanding in her mind of what different objects were, she started communicating because she understood the word water with her teacher running her finger through her palm.
Most humans have multiple sensory inputs (sight, smell, hearing, touch) she only had one which is perhaps closer to an LLM. But conditions she had that LLMs dont have are agency, planning, long term memory etc.
> Can you point to the speed-of-light analogy laws that constrain how LLMs work in a way that excludes the possibility of thought?
Sure, let me switch the analogy if you dont mind. In the chinese room thought experiment we have a man who gets a message and opens a chinese dictionary and translates it perfectly word by word and the person on the other side receives and read a perfect chinese message.
The argument usually goes along the idea of whether the person inside the room "understands" chinese if he is capable of creating 1:1 perfect chinese messages out.
But an LLM is that man, what you cannot argue is that the man is THINKING. He is mechanically going to the dictionary and returning a message that can pass as human written because the book is accurate (if the vectors and weights are well tuned). He is neither an agent, he simply does, and he is not crating a plan or doing anything beyond transcribing the message as the book demands.
He doesnt have a mental model of the chinese language, he cannot formulate his own ideas or execute a plan based on predicted outcomes, he cannot do but perform the job perfectly and boringly as per the book.