An LLM has an internal linguistic model (i.e. it knows token patterns), and that linguistic model models humans' linguistic models (a stream of tokens) of their actual world models (which involve far, far more than linguistics and tokens, such as logical relations beyond mere semantic relations, sensory representations like imagery and sounds, and, yes, words and concepts).
So LLMs are linguistic (token pattern) models of linguistic models (streams of tokens) describing world models (more than tokens).
It thus does not in fact follow that LLMs model the world (as they are missing everything that is not encoded in non-linguistic semantics).
At this point, anyone claiming that LLMs are "just" language models aren't arguing in good faith. LLMs are a general purpose computing paradigm. LLMs are circuit builders, the converged parameters define pathways through the architecture that pick out specific programs. Or as Karpathy puts it, LLMs are a differentiable computer[1]. Training LLMs discovers programs that well reproduce the input sequence. Tokens can represent anything, not just words. Roughly the same architecture can generate passable images, music, or even video.
In this case this is not so. The primary model is not a model at all, and the surrogate has bias added to it. It's also missing any way to actually check the internal consistency of statements or otherwise combine information from its corpus, so it fails as a world model.
D-Machine|20 days ago
So LLMs are linguistic (token pattern) models of linguistic models (streams of tokens) describing world models (more than tokens).
It thus does not in fact follow that LLMs model the world (as they are missing everything that is not encoded in non-linguistic semantics).
hackinthebochs|19 days ago
[1] https://x.com/karpathy/status/1582807367988654081
tovej|21 days ago