top | item 44131213

(no title)

JoshCole | 9 months ago

> it can't "think" of something entirety conceptionally new, because it doesn't really "think".

Hierarchical optimization (fast global + slow local) is a precise, implementable notion of "thinking." Whenever I've seen this pattern implemented, humans, without being told to do so by others in some forced way, seem to converge on the use of verb think to describe the operation. I think you need to blacklist the term think and avoid using it altogether if you want to think clearly about this subject, because you are allowing confusion in your use of language to come between you and understanding the mathematical objects that are under discussion.

> It can produce "new" stuff only by combining the "old" stuff in new ways,

False premise; previously debunked. Here is a refutation for you anyway, but made more extreme. Instead of modeling the language task using a pre-training predictive dataset objective, only train on a provided reward model. Such a setup never technically shows "old" stuff to the AI, because the AI is never shown stuff explicitly. It just always generates new things and then the reward model judges how well it did. Clearly, the fact that it can do generation while knowing nothing, shows that your claim that it can never generate something new -- by definition everything would be new at this point -- is clearly false. Notice that as it continually generates new things and the judgements occur, it will learn concepts.

> But LLM's don't have an understanding of things, they are statistical models that predict what statistically is most likely following the input that you give it.

Try out Jayne's Probability Theory: The Logic Of Science. Within it the various underpinning assumptions that lead to probability theory are shown to be very reasonable and normal and obviously good. Stuff like represent plausibility with real numbers, keep rankings consistent and transitive, reduce to Boolean logic at certainty, and update so you never accept a Dutch-book sure-loss -- which together force the ordinary sum and product rules of probability. Then notice that statistics is in a certain sense just what happens when you apply the rules of probability.

> also having a understanding of certain concepts that allows you to arrive at new points like C, D, E, etc. But LLM's don't have an understanding of things

This is also false. Look into the line of research that tends to go by the name of Circuits. Its been found that models have spaces within their weights that do correspond with concepts. Probably you don't understand what concepts are -- that abstractions and concepts are basically forms of compression that let you treat different things as the same thing -- so a different way to arrive at knowing that this would be true is to consider a dataset with less parameters than there are items in the dataset and notice that the model must successfully compress the dataset in order to complete its objective.

discuss

gitaarik|9 months ago

Yes ok, it can generate new stuff, but it's dependent on human curated reward models to score the output to make it usable. So it still depends on human thinking, it's own "thinking" is not sufficient. And there won't be a point when human curated reward models are not needed anymore.

LLM's will make a lot of things easier for humans, because most of the thinking the humans do have been automated into the LLM. But ultimately you run into a limit where the human has to take over.

JoshCole|9 months ago

> dependent on human curated reward models to score the output to make it usable.

This is a false premise, because there already exist systems, currently deployed, which are not dependent on human-curated reward models.

Refutations of your point include existing systems which generate a reward model based on some learned AI scoring function, allowing self-bootstrapping toward higher and higher levels.

A different refutation of your point is the existing simulation contexts, for example, by R1, in which coding compilation is used as a reward signal; here the reward model comes from a simulator, not a human.

> So it still depends on human thinking

Since your premise was false your corollary does not follow from it.

> And there won't be a point when human curated reward models are not needed anymore.

This is just a repetition of your previously false statement, not a new one. You're probably becoming increasingly overconfident by restating falsehoods in different words, potentially giving the impression you've made a more substantive argument than you really have.

gitaarik|9 months ago

So to clarify, it could potentially come up with (something close to) C, but if you want it to get to D, E, F etc, it will become less and less accurate for each consequentive step, because it lacks the human curated reward models up to that point. Only if you create new reward models for C, the output for D will improve, and so on.

vidarh|9 months ago

> And there won't be a point when human curated reward models are not needed anymore.

This doesn't follow at all. There's no reason why a model can not be made to produce reward models.