top | item 45624126

(no title)

Photons hit a human eye and then the human came up with language to describe that and then encoded the language into the LLM. The LLM can capture some of this relationship, but the LLM is not sensing actual photons, nor experiencing actual light cone stimulation, nor generating thoughts. Its "world model" is several degrees removed from the real world.

So whatever fragment of a model it gains through learning to compress that causal chain of events does not mean much when it cannot generate the actual causal chain.

discuss

ziofill|4 months ago

I agree with this. A metaphor I like is that the reason why humans say the night sky is beautiful is because they see that it is, whereas an LLM says it because it’s been said enough times in its training data.

stouset|4 months ago

To play devil’s advocate, you have never seen the night sky.

Photoreceptors in your eye have been excited in the presence of photons. Those photoreceptors have relayed this information across a nerve to neurons in your brain which receive this encoded information and splay it out to an array of other neurons.

Each cell in this chain can rightfully claim to be a living organism in and of itself. “You” haven’t directly “seen” anything.

Please note that all of my instincts want to agree with you.

“AI isn’t conscious” strikes me more and more as a “god of the gaps” phenomenon. As AI gains more and more capacity, we keep retreating into smaller and smaller realms of what it means to be a live, thinking being.

amelius|4 months ago

Humans evolved to think the night sky is beautiful. That's also training. If humans were zapped by lightning every time they went outside at night, they would not think that a night sky is beautiful.

del82|4 months ago

I mean, I think the reason I would say the night sky is “beautiful” is because the meaning of the word for me is constructed from the experiences I’ve had in which I’ve heard other people use the word. So I’d agree that the night sky is “beautiful”, but not because I somehow have access to a deeper meaning of the word or the sky than an LLM does.

As someone who (long ago) studied philosophy of mind and (Chomskian) linguistics, it’s striking how much LLMs have shrunk the space available to people who want to maintain that the brain is special & there’s a qualitative (rather than just quantitative) difference between mind and machine and yet still be monists.

HarHarVeryFunny|4 months ago

> humans say the night sky is beautiful is because they see that it is

True, but we could engineer AI to see that too, just as evolution has engineered us to see it.

Our innate emotional responses to things has been honed by evolution to be adaptive, to serve a purpose, but the things that trigger these various responses are not going to be super specific. e.g. We may derive pleasure from eating a nice juicy peach, but that doesn't mean that is encoded in our DNA - it's going to be primarily the reaction to sugar/sweetness, a good source of energy, that we are reacting to.

Similarly, we may have an emotional reaction to certain pieces of modern art or artistic expression, but clearly evolution has not selected for those specifically, but rather it is the artist triggering innate responses that evolved for reasons other than appreciation of art.

It's hard to guess what innate responses, that were actually selected for, are being triggered by our response to the night sky, and I'm also not sure how much of our response is purely visual (beauty) as opposed to wonder or awe. Maybe it's an attraction to the unknown, or sense of size and opportunity, with these being the universals that are actually adaptive.

In any case, if we figured out the specifics of our hard wired emotional reactions, that evolution as given us, then we could choose to engineer emotional AI that had those same reactions, in just as genuine a way as we do, if we chose to.

j16sdiz|4 months ago

Beauty standard changes over time, see how people perceive body fat in the past few hundred years. We learns what is beautiful from our peers.

Taste can be acquired and can be cultural. See how people used to had their coffee.

Comparing human to LLM is like comparing something constantly changing to something random -- we can't compare them directly, we need a good model for each of them before comparing.

klipt|4 months ago

What about a blind human? Are they just like an LLM?

What about a multimodal model trained on video? Is that like a human?

ninetyninenine|4 months ago

Guys you realize that you can go to ChatGPT right now and it can generate an actual picture of the night sky because it has seen thousands of pictures and drawings of the actual night sky right?

Your logic is flawed because your knowledge is outdated. LLMs are encoding visual data, not just “language” data.

simianparrot|4 months ago

Here's how I've been explaining this to non-tech people recently, including the CEO where I work: Language is all about compressing concepts and sharing them, and it's lossy.

You can use a thousand words to describe the taste of chocolate, but it will never transmit the actual taste. You can write a book about how to drive a car, but it will only at best prepare that person for what to practice when they start driving, it won't make them proficient at driving a car without experiencing it themselves, physically.

Language isn't enough. It never will be.

subjectivationx|4 months ago

The taste of chocolate is also assuming information-theoretic models are correct and not a use-based, pragmatic theory of meaning.

I don't agree with information-theoretic models in this context but we come to the same conclusion.

Loss only makes sense if there was a fixed “original” but there is not. The information-theoretic model creates a solvable engineering problem. We just aren't solving the right problem then with LLMs.

I think it is more than that. The path forward with a use theory of meaning is even less clear.

The driving example is actually a great example of the use theory of meaning and not the information-theoretic.

The meaning of “driving” emerges from this lived activity, not from abstract definitions. You don't encode an abstract meaning of driving that is then transmitted on a noisy channel of language.

The meaning of driving emerges from the physical act of driving. If you only ever mount a camera on the headrest and operate the steering wheel and pedals remotely from a distance you still don't "understand" the meaning of "driving".

Whatever data stream you want to come up with, trying to extract the meaning of "driving" from that data stream makes no sense.

Trying to extract the "meaning" of driving from driving language game syntax with language models is just complete nonsense. There is no meaning to be found even if scaled in the limit.

bwfan123|4 months ago

Humans perceive phenomena via senses, and then carve categories or concepts to understand them. This is a process of abstraction and each idea has an associated qualia. Then use language to describe these concepts. As such, a concept is grounded either by actual phenomena or operations, or is a composition of other grounded concepts. The creation of categories and grounding them involves constant feedback from the environment - and is a creative process, and we as agents have "skin in the game", in the sense that we get the rewards/punishments for our understanding and actions.

Map vs Territory is a common analogy. Maps describe territories but in an abstract and lossy manner.

But, most of us dont construct grounded concepts in our understanding. We carry a muddle of ungrounded ideas - some told to us by others, and some we intuit directly. There is a long tradition of attempting to think clearly all the way from Socrates, Descartes, Feynman etc.. where an attempt is made to ground the ideas we have. Try explaining your ideas to others, and soon, you will hit the illusion of explanatory depth.

LLM is a map and is a useful tool, but it doesnt interact with the territory, and it does not have skin in the game, and as a result, it cant carve new categories in a learning process that we have as humans.

adrianN|4 months ago

The human experience is also several degrees removed from the „real“ world. I don’t think sensory chauvinism is a useful tool in assessing intelligence potential.

ninetyninenine|4 months ago

This comment is hallucinatory in nature as it is in direct conflict with the in the ground reality of LLMs.

The LLM has both light (aka photons) and language encoded into its very core. It is not just language. You seemed to have missed the boat with all the ai generated visuals and videos that are now inundating the internet.

Your flawed logic is essentially that LLMs are unable to model the real world because they don’t encode photonic data into the model. Instead you think they only encode language data which is an incredibly lossy description of reality. And this line of logic flies against the ground truth reality of the fact that LLMs ARE trained with video and pictures which are essentially photons encoded into data.

So what should be the proper conclusion? Well look at the generated visual output of LLMs. These models can generate video that is highly convincing and often with flaws as well but often these videos are indistinguishable from reality. That means the models have very well done but flawed simulations of reality.

In fact those videos demonstrate that LLMs have extremely high causal understanding of reality. They know cause and effect it’s just the understanding is imperfect. They understand like 85 percent of it. Just look at those videos of penguins on trampolines. The LLM understands what happens as an effect after a penguin jumps on a trampoline but sometimes an extra penguin teleports in which shows that the understanding is high but not fully accurate or complete.

tauwauwau|4 months ago

> but the LLM is not sensing actual photons, nor experiencing actual light cone stimulation

Neither is animal brain. It's processing the signals produced by the sensors. Once the world model is programmed/auto-built in the brain, it doesn't matter if it's sensing real photons, it just has input pins like a transistor or arguments of a function. As long as we provide the arguments, it doesn't matter how those arguments are produced. LLMs are not different in that aspect.

> nor generating thoughts

They do during the chain-of-thought process. Generally there's no incentive to let an LLM keep mulling over a topic as that is not useful to the humans and they make money only when their gears start turning in response to a question sent by a human. But that doesn't mean that LLM doesn't have capability to do that.

> Its "world model" is several degrees removed from the real world.

Just because animal brain has tools called sensors that it can get data from world without external stimuli, it doesn't mean that it's any closer to the world than an LLM. It's still getting ultra processed signals to feed to its own programming. Similarly, LLMs do interact with real world through tools as agent.

> So whatever fragment of a model it gains through learning to compress that causal chain of events does not mean much when it cannot generate the actual causal chain.

Again, a person who has gone blind, still has the world model created by the sight. This person can also no longer generate the chain of events that led to creation of that sight model. It still doesn't mean that this person's world model has become inferior.

tim333|4 months ago

Photons can hit my iphone's sensor in much the same way as they hit my retina and the signals from the first can upload to an artificial neural network like the latter go up my optic nerve to my biological neural network. I don't see a huge difference there.

I'll give you the brain is currently better at the world modelling stuff but Genie 3 is pretty impressive.

tomlockwood|4 months ago

This is so uncannily similar to the "Mary's Room" argument in philosophy that I thought you were going there.

LarsDu88|4 months ago

The workings of a human eye versus a webcam is mostly an implementation detail IMO and has nothing important to say about what underlies "intelligence" or "world models"

It's like saying a component video out cable for the SNES is intrinsically different from an HDMI for putting an image on a screen. They are different, yes, but the outcome we care about is the same.

As for causality, go and give a frontier level LLM a simple counterfactual scenario. I think 4/5 will be able to answer correctly or reasonably for most basic cases. I even tried this exercise on some examples from Judea Pearl's 2018 book, "The Book of Why". The fact that current LLMs can tackle this sort of stuff is strongly indicative of there being a decent world model locked inside many of these language models.

visarga|4 months ago

> then the human came up with language to describe that and then encoded the language into the LLM

No individual human invented language, we learn it from other people just like AI. I go as far as to say language was the first AGI, we've been riding the coats tails of language for a long time.

scrollop|4 months ago

You're saying that language is an intelligence?

So, c++ is intelliengece as well?

It's an intelligence that can independently make deductions and create new ideas?

pastel8739|4 months ago

And even then, the light hitting our human eyes only describes a fraction of all the light in the world (e.g. it is missing ultraviolet patterns on plants). An LLM model of the world is shaped by our human view on the world.

StopDisinfo910|4 months ago

Entities equiped with two limited light sensitive captors encode through a network of carbon based chemical emitters a representation of what its flawed vision system manages to grasp biased towards self preservation.

What's the real world? I'm still puzzled by this reaction I see to LLM, not because I think LLM are undervalued, because most people seem to significantly overestimate what is human intelligence.

wan23|4 months ago

Photons reflected off of objects are not the actual objects. I wouldn't go so far as to say that sensing these is a particularly special way to know about things compared to hearing or reading about them. Further, many humans do not sense photons yet seem to manage to have perfectly fine working world models.

manoDev|4 months ago

That’s a good definition: it’s a model of a model.

It seems the debate seems to center around whether language models are meta-models (in the category sense) or mere encodings (information theory)?

bckr|4 months ago

> Its "world model" is several degrees removed from the real world.

Like insects that weave tokens

dustingetz|4 months ago

what does it mean to “generate thoughts”, exactly?

tsunamifury|4 months ago

Hahahaha I can’t believe you entirely missed the irony here that humans spend all day looking at screens doing the same thing.