top | item 45718554

(no title)

luisml77 | 4 months ago

Awareness is just continuous propagation of the neural network, be that artificial or biological. The reason thoughts just "appear" is because the brain is continuously propagating signal through the neural network. LLMs also do this during their decoding phase, where they reason continuously with every token that they generate. There is no difference here. Then you say "we don't think most of the times using language exclusively" , but neither do LLMs. What most people fail to realise is that in between each token being generated, black magic is happening in between the transformer layers. The same type of magic you describe. High dimensional. Based on complex concepts. Merging of ideas. Fusion of vectors to form a combined concept. Smart compression. Application of abstract rules. An LLM does all of these things, and more, and you can prove this by how complex their output is. Or, you can read studies by Anthropic on interpretability, and how LLMs do math underneath the transformer layers. How they manipulate information.

AGI is not here with LLMs, but its not because they lack reasoning ability. It's due to something different. Here is what I think is truly missing: continuous learning, long term memory, and infinite and efficient context/operation. All of these are tied together deeply, and thus I believe we are but a simple breakthrough away from AGI.

discuss

10weirdfishes|4 months ago

There are very significant differences between biological and artificial neural networks. Artificial neural networks are mathematical attempts to replicating how the brain’s neurons work. They are not and were never meant to be 1 to 1 replications. There is the difference in scale, where the “parameters” of human neural networks absolutely dwarf the current LLMs we have today. There is also the fact that they are materially different. The underlying biology and cell structure affects biological neural networks in ways that artificial neural networks just simply dont have access to.

The idea of awareness being propagations through the NN is an interesting concept though. I wonder if this idea be proven through monitoring the electrical signals within the brain.

luisml77|4 months ago

People like to focus on the differences between the brain and artificial neural networks. I myself believe the only thing that truly matters is that you can form complex functions with the common neuron element. This is achieved via linking lots them together, and by each having a property known as non-linearity. These two things ensure that with neurons you can just about approximate any linear or non-linear function or behaviour. This means you can simulate inside your network pretty much any reality within this universe, its causation and the effects. The deeper your network the more complex the reality you can "understand". Understand just means simulate and run inputs to get outputs in a way that matches the real phenomenon. When someone is said to be "smart", it means they possess a set of rules and functions that can very accurately predict a reality. You mention scale, and while its true the number of neuron elements the brain has is larger than any LLM, its also true the brain is more sparse, meaning much less of the neurons are active at the same time. For a more fair comparison, you can also remove the motor cortex from the discussion, and talk just about the networks that reason. I believe the scale is comparable.

In essence, I think it doesn't matter that the brain has a whole bunch of chemistry added into it that artificial neural networks don't. The underlying deep non-linear function mapping capability is the same, and I believe this depth is, in both cases, comparable.

laterium|4 months ago

Why would it have to be a 1 to 1 replication? Isn't that a strawman argument? NNs can basically store the collective of knowledge of humanity in that miniscule amount of neurons. NNs also run at much much higher frequency than human brains. Does that make human brains inferior and not worthy of being called aware by the same line of argumentation? Why do these differences even matter? I can imagine a vastly different form of awareness than humans just fine. They can both be aware and not that similar.

emptysongglass|4 months ago

> Awareness is just continuous propagation of the neural network, be that artificial or biological. The reason thoughts just "appear" is because the brain is continuously propagating signal through the neural network.

This is just a claim you are making, without evidence.

The way you understand awareness is not through "this is like that" comparisons. These comparisons fall over almost immediately as soon as you turn your attention to the mind itself, by observing it for any length of time. Try it. Go observe your mind in silence for months. You will observe for yourself it is not what you've declared it to be.

> An LLM does all of these things, and more, and you can prove this by how complex their output is.

Complex output does not prove anything. You are again just making claims.

It is astoundingly easy to push an LLM over to collapse into ungrounded nonsense. Humans don't function this way because the two modes of reasoning are not alike. It's up to those making extraordinary claims to prove otherwise. As it is, the evidence does not exist that they behave comparably.

2OEH8eoCRo0|4 months ago

Why do half the people on this topic not understand what subjective experience is?

It's immaterial and not measurable thus possibly out of reach of science.

antonvs|4 months ago

> This is just a claim you are making, without evidence.

Wait, you mean this HN comment didn't casually solve the hard problem of consciousness?

buster|4 months ago

The sentence "It is astoundingly easy to push an LLM over to collapse into ungrounded nonsense" makes me wonder.

How easy? What specific methods accomplish this? Are these methods fundamentally different from those that mislead humans?

How is this different from exploiting cognitive limitations in any reasoning system—whether a developing child's incomplete knowledge or an adult's reliance on heuristics?

How is it different from Fake News and adults taking Fake News for granted and replicating bullshit?

Research on misinformation psychology supports this parallel. According to https://www.sciencedirect.com/science/article/pii/S136466132...:

  "Poor truth discernment is linked to a lack of careful reasoning and relevant knowledge, as well as to the use of familiarity and source heuristics."

Perhaps human and LLM reasoning capabilities differ in mechanism but not in fundamental robustness against manipulation?

Maybe the only real difference is our long term experience and long term memory?

luisml77|4 months ago

Complex output can sometimes give you the wrong idea, I agree. For instance, a study Anthropic did a while back showed that, when an LLM was asked HOW it performed a mathematical computation (35 + 59), the response the LLM gave was different from the mechanistic interpretation of the layers [1]. This showed LLMs can be deceptive. But they are also trained to be deceptive. Supervised fine tuning is imitation learning. This leads the model to learn to be deceptive, or answer what is usually the normal explanation, such as "I sum first 5+9, then add the remainder to... etc". The LLM does this rather than actually examining the past keys and values. But it does not mean it can't examine its past keys and values. These encode the intermediate results of each layer, and can be examined to identify patterns. What Anthropic researchers did was examine how the token for 35 and for 39 was fused together in the layers. They compare these tokens to other tokens, such as 3 , 5 , 9. For an LLM, tokens are high dimensional concepts. This is why you can compare the vectors to each other, and figure out the similarity, and therefore break down the thought process. Yes, this is exactly what I have been discussing above. Underneath each token prediction, this black magic is happening, where the model is fusing concepts through summation of the vectors (attention scores). Then, merged representations are parsed by the MLPs to generate the refined fused idea, often adding new knowledge stored inside the network. And this continues layer after layer. A repeated combination of concepts, that start with first understanding the structure and order of the language itself, and end with manipulation of complex mathematical concepts, almost detached from the original tokens themselves.

Even though complex output can be deceptive of the underlying mental model used to produce it, in my personal experience, LLMs have produced for me output that must imply extremely complex internal behaviour, with all the characteristics I mentioned before. Namely, I frequently program with LLMs, and there is simply zero percent probability that their output tokens exist WITHOUT first having thought at a very deep level about the unique problem I presented to them. And I think anyone that has used the models to the level I have, and interacted with them this extensively, knows that behind each token there is this black magic.

To summarize, I am not being naive by saying I believe everything my LLM says to me. I rather know very intimately where the LLM is deceiving me and when its producing output where its mental model must have been very advanced to do so. And this is through personal experience playing with this technology, both inference and training.

[1] https://www.anthropic.com/research/tracing-thoughts-language...

ozgung|4 months ago

> What most people fail to realise is that in between each token being generated, black magic is happening in between the transformer layers.

Thank you by saying that. I think most people have an incomplete mental model for how LLMs work. And it's very misleading for understanding what they really do and can achieve. "Next token prediction" is done only at the output layer. It's not what really happens internally. The secret sauce is at the hidden layers of a very deep neural network. There are no words or tokens inside the network. A transformer is not the simple token estimator that most people imagine.

luisml77|4 months ago

Yes, exactly! Finally someone who understands this.