top | item 43498609

(no title)

This is not a bad way of looking at it, if I may add a bit, the llm is a solid state system. The only thing that survives from one iteration to the next is the singular highest ranking token, the entire state and "thought process" of the network cannot be represented by a single token, which means that every strategy is encoded in it during training, as a lossy representation of the training data. By definition that is a database, not a thinking system, as the strategy is stored, not actively generated during usage.

The anthropomorphization of llms bother me, we don't need to pretend they are alive and thinking, at best that is marketing, at worst, by training the models to output human sounding conversations we are actively taking away the true potential these models could achieve by being ok with them being "simply a tool".

But pretending that they are intelligent is what brings in the investors, so that is what we are doing. This paper is just furthering that agenda.

discuss

Philpax|11 months ago

> The only thing that survives from one iteration to the next is the singular highest ranking token, the entire state and "thought process" of the network cannot be represented by a single token, which means that every strategy is encoded in it during training, as a lossy representation of the training data.

This is not true. The key-values of previous tokens encode computation that can be accessed by attention, as mentioned by colah3 here: https://news.ycombinator.com/item?id=43499819

You may find https://transformer-circuits.pub/2021/framework/index.html useful.

dev_throwaway|11 months ago

This is a optimization to prevent redundant calculations. If it was not performed the result would be the same, just served slightly slower.

The whitepaper you linked is a great one, I was all over it a few years back when we built our first models. It should be recommended reading for anyone interested in CS.

kazinator|11 months ago

People anthropomorphize LLMs because that's the most succinct language for describing what they seem to be doing. To avoid anthropomorphizing, you will have to use more formal language which would obfuscate the concepts.

Anthropo language has been woven into AI from the early beginnings.

AI programs were said to have goals, and to plan and hypothesize.

They were given names like "Conniver".

The word "expert system" anthropomorphizes! It's literally saying that some piece of logic programming loaded with a base of rules and facts about medical diagnosis is a medical expert.