top | item 38474696

Large language models lack deep insights or a theory of mind

277 points| mnode | 2 years ago |arxiv.org

261 comments

order

tinco|2 years ago

I think that if they would, that would be very surprising and indicative of a lot of wastefulness inside the model architecture. All these tests are simple single prompt experiments, so the LLM's get no chance to reason about their responses. They're just system 1 thinking, the equivalent of putting a gun to someone's head and asking them to solve a large division in 2 seconds.

I bet a lot of these experiments would already solvable by putting the LLM in a simple loop with some helper prompts that make it restructure and validate its answers, form theories and get to explore multiple lines of thought.

If an LLM would be able to do that in a single prompt, without a loop (so the LLM always answers in a predictable amount of time), then it would mean its entire reasoning structure is repeated horizontally through the layers of its architecture. That would be both limiting (i.e. limit the depth of the reasoning to the width of the network) and very expensive to train.

Androider|2 years ago

The equivalent for a human would be an reflexive response to a question, the kind you could immediately answer after being woken up at 3am in the morning. That type of answer has been deeply trained into the human networks and also requires no deep insight.

But if a human is allowed time and internal reasoning iterations, so should the LLM when determining if it has deep insight. Right now we're simply observing input -> output of LLMs, the equivalent of snap answers from a human. But nothing says it couldn't instead be an input -> extensive internal dialogue, maybe even between multiple expert models for seconds, minutes or hours, that are not at all visible to the prompter -> final insightful answer. Maybe future LLMs will say, "let me get back to you on that".

two_in_one|2 years ago

Just note that loop doesn't have to be visible from outside. It can be internal, with another driving thread asking right questions. Inner monologue. Then the summary is given back to user. This will give the model space for 'thinking' with internally generated text much large than the visible prompt + output. This way multi-step logic can be implemented.

lixy|2 years ago

Yep, prototype exactly that this past week. With a strong instruction spec prompt from the start, you can have an AI come up with a much better answer by making sure it knows it has time to answer the questions and how it should approach the problem in stages.

The great part is with clear enough directions it also knows how to evaluate whether its done or not.

uoaei|2 years ago

> They're just system 1 thinking, the equivalent of putting a gun to someone's head and asking them to solve a large division in 2 seconds.

No, it's the equivalent of putting a gun to someone's head and asking them "what are my intentions?" Which is readily available to any being with a theory of mind.

twobitshifter|2 years ago

There are plenty of models that use introspection and check answers, that’s the idea behind let’s think about it step by step.

creer|2 years ago

At the same time, if LLMs are based on all or enough human writings then don't they necessarily contain a theory of mind? A rather general, smoothed out and still neurotic one probably. But still just like an LLM can't be expected to have a specific knowledge of hydraulics, it also has read more hydraulics than even experts might be expected to. That's the entire issue about it, right? This issue of "is most of our mind basically just mixing and matching stuff we have seen, read, heard?" Do humans have some magical theory of mind that somehow stands ASIDE from all the "normal" learned stuff?

Of course, yes, we do know one thing that's missing in LLMs, which is "loop and helpers" like you describe. Which I'm sure many people are currently hacking at - one way being for the LLM to talk to itself.

But as for "a theory of mind", if enough writings served as input, then LLMs do have plenty of that.

Another question is whether LLMs are raised to behave like humans (which might be where they most NEED some theory of mind). Of course not. The ones we know most about are only question answerers. The theory of mind they might have (that is not negated by the lack of loop and internal deliberation) may be overwhelmed by the pre- and post-processing: "no sex, no murder plots, talk to the human like they are 5, bla bla bla". And yet you can ask things like "Tell it like you are speaking to 5 year olds who want to have a fun time". Some theory of mind makes it through.

rf15|2 years ago

they can't reason though, sadly - the premise does not hold.

menssen|2 years ago

I appreciate this paper for relatively clearly stating what "human-like" might entail, which in this case involves "reasoning about the causes behind other people's behavior" which is "critical to navigate the social world" as outlined in this citation:

https://www.sciencedirect.com/science/article/abs/pii/S00100...

I get frustrated often when people argue "well, it isn't really intelligent" and then give examples that are clearly dependent on our brain's chemical state and our bodies' existence in-the-physical-world.

I get the feeling that when/if we are all enslaved by a super-intelligent AI that we do not understand its motives, we will still argue that it is not intelligent because it doesn't get hungry and it can't prove to us that it has Qualia.

This paper argues that gpts are bad at understanding human risk/reward functions, which seems like a much more explicit way to talk about this, and also casts it in a way that could help reframe the debate about how human evolution and our physical beings might be significantly responsible for the structure of our rational minds.

swatcoder|2 years ago

The underlying problem is that "intelligence" is itself a crappy, poorly defined word with a fraught and inconsistent history.

It doesn't appear until the early 20th century, in the shadow of compulsory education and the challenges it presented, first as a technical label for attempts to sort students -- and later soldiers -- into the tracks in which they're most likely to succeed, and then being haphazardly asserted (but not scientifically evidenced) as some general measure of mental aptitude.

At that point it shifts from something qualitative (which mental tasks might someone be good at) to something quantitative (how much more might one personal excel at all mental tasks than another), and the burgeoning field of modern American psychology goes "Aha! A quantitative measure! Here's our meal ticket to being recognized as a science instead of those quacks from Vienna", with far too much at stake to either question the many assumptions at play or the inconsistent history of usage.

Momentum takes hold and the public takes the word into its everyday vernacular, even while it's still not a clear and sound concept in its technical domain. [Most of this is history is more academically covered in Danziger's 1987 "Naming the Mind" which is excellent, and critical foundational reading to contextualize recent hot discussions in AI]

The way you're using it when you worry about "super-intelligence" is in the sense of intelligence being some universal, unbounded, quantitative independent variable along the lines of "the more intelligent something is, the more cunningly it can pursue some rationalized goal" -- some master strategist.

That's fine, and you're not alone in that, but there's not really any sound scientific groundwork to establish that there exists some quality of the world that scales like that. You're fear, and what you try to distinguish conceptually from what the paper addresses, is an inductive leap made from highly unstable ground. It's in the same invented, purely abstract idea-space of "omnipotence" or "omniscience" where one takes a practical idea like "power to influence" or "ability to know fact" and inductively draws a line from these practical senses towards some abstract infinite/incomprehensible version of that thing. But that inductive leap a Platonic logician's parlor trick and ends up raising all kinds of abstract paradoxes, as well countless physical impracticalities about how such things could exist.

So a lot of people (academic and lay) just aren't with you in taking that framing of intelligence very seriously. For many, an "super-intelligent" software whose "motives" we don't understand is just a program that produces incorrect outputs and ought to be debugged or retired, and the more interesting questions around machine "intelligence" are practical ones like "what tasks are these programs well-suited for". Here, the authors point out that the current batch of programs are not good at tasks that benefit from a theory of mind.

Knowing the answer to that kind of question reaches back to the earliest and least disputable sense of the word, where we saw that some new students and soldiers excelled at certain tasks and struggled with others, and wanted to understand how best to educated/assign them. And likewise, as we look at these tools, the pressing question for engineers and businesses is "what are they good for and what are they not good for" rather than the fantastical "what if we make a broken program and it wants to kill everyone and we don't notice and forget to shut it off"

mistermann|2 years ago

> I get frustrated often when people argue "well, it isn't really intelligent" and then give examples that are clearly dependent on our brain's chemical state and our bodies' existence in-the-physical-world.

A big part of the problem is that "is" has a wide variety of inconsistent meanings, and that this fact is sub-perceptual, and that it is culturally very inappropriate to comment on aspects of our culture like this, preventing knowledge of the problem from spreading.

/u/Swatcoder makes essentially the same point but in much more detail, though regarding less important words.

fredliu|2 years ago

I have small kids, toddlers, who can already speak the language but still developing their "sense of the world" or "theory of mind" if you will. Maybe it's just me, but talking to toddlers often reminds me of interacting with LLMs, where you would have this realization from time to time "oh, they don't get this, need to break down more to explain". Of course LLM has more elaborate language skills due to its exposure to a lot more text (toddlers definitely can't speak like Shakespeare if you ask them, unless, maybe, you are the tiger parents that's been feeding them Romeo and Juliet since 1.), but their ability of "reasoning" and "understanding" seems to be on a similar level. Of course, the other "big" difference, is that you expect toddlers to "learn and grow" to eventually be able to understand and develop meta cognitive abilities, while LLMs, unless you retrain them (maybe with another architecture, or meta architecture), "stay the same".

TeMPOraL|2 years ago

> Maybe it's just me, but talking to toddlers often reminds me of interacting with LLMs

It's not just you. It hit me almost a year ago, when I realized my then 3.5yo daughter has a noticeable context window of about 30 seconds - whenever she went on her random rant/story, anything she didn't repeat within 30 seconds would permanently fall out of the story and never be mentioned again.

It also made me realize why small kids talk so repetitively - what they don't repeat they soon forget, and what they feel like repeating remains, so over the course of couple minutes, their story kind of knots itself in a loop, being mostly made of the thoughts they feel compelled to carry forward.

passion__desire|2 years ago

It's not just true about toddlers but also for adults in particular time frame. Maturity of thought is cultural phenomenon. Descartes used to think animals are automaton while they behaved exactly like humans in almost all aspects in which he could investigate animals and humans during those times and yet he reached illogical conclusion.

joduplessis|2 years ago

For me, the entire AGI conversation is hyperbolic / hype. How can we infer intelligence to something when we, ourselves, have such a poor (none) grasp of what makes us conscience? I'm associating intelligence with consciousness - because it seems correlated. Are we really ready to associate "AGI" with solving math problems ("new Q algo.")? That seems incredibly naive & reinforces my opinion that LLM's are much more like crypto, than actual progress.

corethree|2 years ago

It's not hype. It's a language problem that makes people like you think this way.

The problem is consciousness is a vocabulary word that establishes a hard boundary where such a boundary doesn't exit. The language makes you think either something is conscious or it is not when the reality is that these two concepts are actually extreme endpoints on a gradient.

The vocabulary makes the concept seem binary and makes it seem more profound then it actually is.

Thus we have no problem identifying things at the extreme. A rock is not conscious. That's obvious. A human IS conscious, that's also obvious. But only because these two objects are defined at the extremes of this gradient.

For something fuzzy like chatGPT, we get confused. We think the problem is profound, but in actuality it's just poorly defined vocabulary. The word consciousness, again, assumes the world is binary that something is either/or, but, again, the reality is a gradient.

When we have debates about whether something is "conscious" or not we are just arguing about where the line of demarcation is drawn along the gradient. Does it need a body to be conscious? Does it need to be able to do math? Where you draw this line is just a definition of vocabulary. So arguments about whether LLMs are conscious are arguments about vocabulary.

We as humans are biased and we blindly allow the vocabulary to mold our thinking. Is chatGPT conscious? It's a loaded question based on a world view manipulated by the vocabulary. It doesn't even matter. That boundary is fuzzy, and any vocab attempting to describe this gradient is just arbitrary.

But hear me out. chatGPT and DALL-E is NOT hype. Why? Because along that gradient it's leaps and bounds further than anything we had just even a decade ago. It's the closest we ever been to the extreme endpoint. Whichever side you are on in the great debate both sides can very much agree with this logic.

poulsbohemian|2 years ago

Completely agree, and while we are at it... look I'm just a guy, not an expert, but I can't understand why there's so much focus on AGI. It feels like there are so many niche areas where we could apply some kind of analytical augmentation and by solving problems in the small, might learn something that would help figure the larger question of intelligence. I don't need the AI to replace everything I do, I need it to solve 10,000 micro problems I solve every day - each of which is a business opportunity for someone.

RGamma|2 years ago

A(G)I models don't need higher order thinking or somesuch to be impactful. For that they just need to increase productivity with or without job loss (be Good Enough), which they are on a good track for.

The real impacts will come when they are properly integrated into the current computational fabric, which everyone is racing to do as we write this.

nyrikki|2 years ago

A particular subset of Connectivism have a philosophical belief that the mind IS a neutral net, not that it is a reductive practical model.

Hinton is one of these individuals and with no definition of what intelligence is it is an understandable of dogmatic position.

This whole problem of not being able to define what intelligence is pretty much allows us all to pick and choose.

In my mind BPP is the complexity class solvable by ANNs and it is a safe and educated guess that most likely BPP=P.

BPP being one of the largest practical complexity classes makes work in this area valuable.

But due to many reasons that I won't enumerate again AGI simply isn't possible and requires a dogmatic position to believe in for people who have even a basic understanding of how they work and the limits from the work of Gödel etc...

But many of the top scientists in history have been believers of numerology etc...

Associating math with LLMs is a useful too to avoid wasted effort by those who don't believe AGI is close, but it won't convince those who are true believers.

LLM's are very useful for searching very large dimensional spaces and for those problems that are ergotic with the Markov property they can find real answers.

But for most of what is popular in the press will almost certainly be a dead end for generalized use of the systems are not extremely error tolerant.

Unfortunately it may take another AI winter to break the hype train but I hope not.

IMHO it will have a huge impact but overconfident claims will cause real pain and misapplication for the foreseeable future.

upghost|2 years ago

Couldn't agree more. How about this -- I think we've already reached AGI. Let me know if this tracks: Pick a set of tasks that can be considered AGI tasks. Provided the task sequences can be compared as closer to AGI or further from AGI, we can create a reward model using the same techniques as were used by ChatGPT via RLHF. Thus, for any definition of AGI that is meaningful and selectable, even if subjectively selectable or arbitrarily preferential, we can create a reward model for it.

You might say, well thats not AGI, AGI must also do such and such. Well, we can get arbitrarily close to that definition as well via RLHF.

Another objection might be: well, if thats the definition of AGI, that seems really underwhelming compared to the hype train. This says nothing about autonomy, sentience, free will -- exactly. Those concepts can or should be orthogonal to doing productive work.l, IMHO.

So, there it is. We can now make a reward model for folding socks, and use gradient descent with RL to do the motion planning.

Maybe thats AGI and maybe its not, but I'd really love it if we had a golden period between now and total enshittification that involved laundry folding robots.

33a|2 years ago

Looking at their data and their experiments, I'd actually come to the opposite conclusion of the title. It's true that current LLMs are probably not quite at human level performance for these tasks, they're not that far off either and clearly we see as models increase in size and sophistication their performance on these tasks are improving.

So it seems like maybe a better title would be "LLMs don't have as advanced a theory of mind as a human does... for now..."

famouswaffles|2 years ago

Indeed. Not sure what i was expecting reading the title but "GPT-4V is close to or matching human median performance on most of these tasks" was not it.

hiddencost|2 years ago

Another paper in a long series that confuses "our tests against currently available LLMs tuned for specific tasks found that they didn't perform well on our task" with "LLMs are architecturally unsuitable for our task".

marcosdumay|2 years ago

Our tests against current cars found that they didn't perform well on transatlantic flights... But who knows what the future holds? Maybe we should test them again next year.

LLM names an specific product, aimed at solving an specific problem.

dsr_|2 years ago

There is no reason to believe (evidence) that any meaning ascribed to an LLM's utterances comes from the LLM rather than being pareidolia.

If you've found some, please let everyone know.

famouswaffles|2 years ago

It's a weird title anyway. I was expecting worse results but GPT-4V is close to or matching Human median performance on most of the tests besides the multimodal "Intuitive Psychology" tests.

deeviant|2 years ago

> A chief goal of artificial intelligence is to build machines that think like people.

I disagree with the topic sentence.

The goal should not be to "build machines that think like people", but to build machines that think, period. The way humans think is unlikely to be the optimal way to go about thinking anyways.

Instead of talking about thinking, we should be talking about function. Less philosophy and more reality. Can the system reason itself through various representative challenges as well as or better than human? If yes, it doesn't much matter how it does it. In fact, it's probably for the best if we can create AI that thinks completely different than humans, has no consciousness or self awareness, but still can do what humans can do and more.

mcguire|2 years ago

The problem here is, how do you know that your machine thinks if it doesn't think like humans?

Game AIs are functionally much better than humans but no one believes they can think, right?

Oh, but if you are arguing for AI from a specialized tool standpoint and not a general intelligence standpoint, if you are talking about "weak" AI rather than "strong" AI, then I'm right there with you. :-)

randcraw|2 years ago

The topic sentence was the mantra of nearly all AI research back in the days of good-old-fashioned-AI, AKA symbolic AI. Understanding how reasoning is implemented by our brains was a much more compelling prospect than being able to implement 'intelligence' compositionally but without understanding how software achieved it -- which is largely where we find ourselves now. Today's AI is theory-free leaving us unenlightened about the continuum of intelligence -- across species, or within a human as our brain matures or goes pathological.

Many scientists outside the AI field have long shared an interest in the objective of how to "think like people" using software. Far fewer care if the AI is inexplicable (or if it can't be dissected into constituent components, thereby enabling us to explore the mind's constraints and dependencies among its cognitive processes).

pixl97|2 years ago

The problem here is this breaks the much more complicated issue of alignment.

The paperclip optimizer is a great parable here. If you build your intelligence to build as many paperclips as cheaply as possible don't be surprised when said intelligence disassembles you and the rest of the universe to do so.

So yea, HOW starts mattering a whole lot when you want to ensure it understands that it shouldn't do some particular things.

trash_cat|2 years ago

We don't make planes based on how birds flap their wings.

fnordpiglet|2 years ago

In Buddhism there’s the idea that our core self is awareness, which is silent - it doesn’t think in a perceptible way, it doesn’t feel in a visceral way, but it underpins thought and feeling, and is greatly impacted by it. A large part of meditation and “release of suffering” is learning to let your awareness lead your thinking rather than your thinking lead your awareness.

To be clear, I think this is in fact a correct assessment of the architecture of intelligence. You can suspend thought and still function throughout your day in all ways. Discursive thought is entirely unnecessary, but it is often helpful for planning.

My observation of LLMs in such a construction of intelligence is they are entirely the thinking mind - verbal, articulate, but unmoored. There is no, for lack of a better word, “soul,” or that internal awareness that underpins that discursive thinking mind. And because that underlying awareness is non articulate and not directly observable by our thinking and feeling mind, we really don’t understand it or have a science about it. To that end, it’s really hard to pin specifically what is missing in LLMs because we don’t really understand ourselves beyond our observable thinking and emotive minds.

I look at what we are doing with LLMs and adjacent technologies and I wonder if this is sufficient, and building an AGI is perhaps not nearly as useful as we might think, if what we mean is build an awareness. Power tools of the thinking mind are amazingly powerful. Agency and awareness - to what end?

And once we do build an awareness, can we continue to consider it a tool?

pixl97|2 years ago

https://en.wikipedia.org/wiki/Moravec%27s_paradox

While you're adding a bunch of eastern philosophy to it, we need to take a step back from 'human' intelligence and go to animal and plant intelligence to get a better idea of the massive variation in what covers thought. In animal/insects we can see that thinking is not some binary function of on or off. It is an immense range of different electrical and chemical processes that involve everything from the brain and the nerves along with chemical signaling from cells. In things like plants and molds 'thinking' doesn't even involve nerves, it's a chemical process.

A good example of this at the human level is a reflex. Your hand didn't go back to your brain to ask for instructions on how to get away from the fire. That's encoded in the meat and nerves of your arm by systems that are much older than higher intelligence. All the systems for breath, drink, eat, procreate were in place long before high level intelligence existed. Intelligence just happens to be a new floor stacked hastily on top of these legacy systems that happened to be beneficial enough it didn't go extinct.

Awareness is another one of those very deep rabbit hole questions. There are 'intelligent' animals without self awareness, but with awareness of the world around them. And they obviously have agency. Of course this is where the AI existentialists come in and say wrapping up agency, awareness, and superintelligence may not work out for humans as well as we expect.

danenania|2 years ago

Another idea from Buddhism is that this core of awareness you're talking about is nothingness. So when you stop all thought (if such a thing is really possible), you temporarily cease to exist as an individual consciousness. "Awareness" is when the thoughts come back online and you think "whoa, I was just gone for a bit".

If that's how it works, then the "soul" is more like an emergent phenomenon created by the interplay between the various layers of conscious thought and the base layer of nothingness when it's all turned off. That architecture wouldn't necessarily be so difficult to replicate in AI systems.

jacobsimon|2 years ago

This is a profound question but I also wonder if this non-thinking “awareness” you’re referring to is largely defined by quieting the thinking mind and listening to the senses more directly. A lot of meditation is about tuning out thoughts and focusing on proprioception like breathing, the feelings of the body, etc.

sdwr|2 years ago

Maybe the soul is social, and oriented towards others? I believe it can be constructed.

If you assume that "the eyes are the window to the soul", you notice some interesting properties.

1. It is far more observable from the outside (eyes open/lidded/closed, emotion read in eyes)

2. It affects behavior in a diffuse way

3. It pays attention but does not dictate

Merrill|2 years ago

Decision making seems fundamental to intelligence, is done by animals and humans, and can be done without the use of language or logic. This is the case when someone "decided without thinking".

Decision making requires imagination or the ability to envision alternative future states that may result from various choices.

Imagination is the start of abstract thinking. Consciousness results from the individual thinking abstractly about itself and how it interacts with the world.

dimal|2 years ago

I’ve been thinking along similar lines. It’s like with LLMs, they’ve created the part of the mind that is endlessly chattering, generating stories, sometimes true, sometimes false, but there’s no awareness or consciousness that ever steps back and can see thoughts as thoughts. And I don’t see how awareness or consciousness would arise from just more of the same (bigger models). It seems to be a fundamentally different part of the mind. I wonder if AGI is possible without this. AGI under some definition (good enough to replace most humans) may be possible. But it wouldn’t be aware. And without awareness, I don’t see how it could be aligned. It may appear to be aligned but then eventually it would probably get caught in a delusional feedback loop that it has no capacity to escape, because it can’t be aware of its own delusion.

hilux|2 years ago

> A chief goal of artificial intelligence is to build machines that think like people.

Maybe that's their goal.

But for many users of AI, the goal is to have easy and affordable access to a machine that, for some input (perhaps in a tightly constrained domain), gives us the output that we would expect from a high-functioning human being.

When I use ChatGPT as a coding helper, I really don't care about its "theory of mind." And its insights are already as deep (actually more deep) as I get from most humans I ask for help. Real humans, not Don Knuth, who is unavailable to help me.

dontupvoteme|2 years ago

Telling it to write like (or outright that it is) Knuth might just get it to write more efficient algorithms as it were.

IIRC one of the prompting techniques developed this year was to ask the model who are some world class experts in a field and then have it write as if it was that group collaborating on the topic.

marmaduke|2 years ago

> insights are already as deep (actually more deep) as I get from most humans I ask for help

This was my thought as well. But then I figured if I can't get someone to give me thoughtful feedback, I might have bigger problems to solve.

krainboltgreene|2 years ago

Look this is the only time I'll engage in this sort of discussion on HN[1], but first Donald Knuth is a real Human and it's extremely weird to position world class experts as something otherworldly. Second, suppose you got what you wished for (you used the "us" pronoun), is that not a sentient mind that you're forcing to do your labour? Does that not raise a ton of red flags in your ethics?

[1] normally I find HN discussions about what if chatGPT is human or "humans are just autocompletes" to be highschool-level scifi and cringe respectively

Barrin92|2 years ago

No LLMs don't think like people, they're architecturally incapable of doing so. They have, physically unlike humans no access to their own internal state and they're, save for a small context window, static systems. They also have no insights. There's a hilarious video about LLM Jailbreaks by Karpathy[1] from a week ago, where he shows how you can break model responses by asking the same question with a base64 string, preceding the prompt with an image of a panda(???) or just random word salad.

LLM's are basically a validation of Searle's Chinese room. What they've proven is that you can build functioning systems that perform intelligent tasks purely at the level of syntax. But there is no (or very little) understanding of semantics. If I ask a person on how to end the world, whether I ask in French or English or base64 or perform a 50 word incantation beforehand likely does not matter. (unless of course the human is also just parroting an answer)

[1] https://youtu.be/zjkBMFhNj_g?t=2974

mcguire|2 years ago

I was right there with you until you mentioned Searle. :-)

The Chinese room argument is bad in that it hides an assumption of mind/body dualism. If you believe that humans have "souls" and other things do not, then you have a qualitative difference between a human or a machine. On the other hand, if you are a materialist then you are faced with the problem that humans don't have much understanding of semantics either. We're all chemical processes and it's hard for those to get much into semantics.

But then, the difference between LLMs and humans becomes quantitative, sort of, and since I cannot say that LLMs and humans are qualitatively different, the only argument I can find is that in my experience, LLMs have never responded in a way that leads me to believe that they are anything other than a statistical model of language. Humans, on the other hand, are not a statistical model of language.

pixl97|2 years ago

While you right about LLMs, you're not really making the case for humans well at all.

Human insight is really easy to break, confidence men wouldn't really be a thing if it were hard to break. Simply putting a statement like "I love you" in front of a statement commonly overrides our intellect. Or offering a chocolate bar in trade of our passwords. If you want a human to tell you how to end the world, you'd just convince them to be your friend first.

TeMPOraL|2 years ago

> What they've proven is that you can build functioning systems that perform intelligent tasks purely at the level of syntax. But there is no (or very little) understanding of semantics.

That's not how LLMs work though, and I'm increasingly convinced that "syntax" and "semantics" are turning into annoyingly useless ideas people forget are descriptive, in the same way grammar books and dictionaries are descriptive.

My model of LLMs is that, in training, they're positioning inputs in an absurdly high-dimensional[0] latent space. That space is big enough to encode anything you'd consider "syntax" and "semantics" on some subset of dimensions. As a result, the model sidesteps the issue of "source of meaning" - there is no meaning but that formed through associations (proximity). This is pretty much how we do it, too - when you think of "chair", there is no token for platonic ideal of a chair in your mind. The word "chair" has meanings and connotations defined via other words, which themselves are defined via other words, ad infinitum, with the only grounding being associations to sensory inputs.

--

[0] - On the order of 100 000 dimensions for GPT-4, perhaps more now for GPT-4V / GPT-4-Turbo.

int_19h|2 years ago

Searle's Chinese room is a good example of begging the question.

As for the rest of it, the LLM is basically "raw compute". You need a self-referential loop and long-term memories for it to even have the notion of self. But looking at it at that level and discounting it as "incapable of thinking" is missing the point - it's the larger system of which LLM is one part, albeit a key one (and which we're still trying to figure out how to build) that might actually be conscious etc.

melenaboija|2 years ago

Few weeks ago I did an experiment after a discussion here about LLMs and chess.

Basically inventing a board game and play against ChatGPT and see what happened. It was not able to do a single move, even having provided all the possible start moves in the prompt as part of the rules.

Not that I had a lot of hope about it, but it was definitely way worst than I expected.

If someone wants to take a look at it:

https://joseprupi.github.io/misc/2023/06/08/chat_gpt_board_g...

golergka|2 years ago

You haven't specified what model did you use, and the green ChatGPT icon in the shared conversation usually signifies GPT-3.5 model.

Here's my attempt at similar conversation — it seems GPT-4 is able to visualise the board and at least do a valid first move.

https://chat.openai.com/share/98427e21-678c-4290-aa8f-da8e93...

GaggiX|2 years ago

I have played some moves with GPT-4 and they seem right to me, what does this mean? That the model switched from not understanding to understanding, from unintelligent to intelligent? I don't think so, GPT-4 is just a more intelligent model than GPT-3.5 and it does understand more.

Also in this game if I don't move the queen I force a draw, right?

FrustratedMonky|2 years ago

I'm older.

I've bought 'new' board games for kids.

Then, I have been un-able to play because the instructions were pretty bad.

Humans also need to 'learn'. Need a few play-throughs.

No human is going out and 'in a vacuum' with no experience, buying Risk and from scratch, read instructions and play perfect game winning strategy.

resters|2 years ago

Here's my theory:

Consider a typical LLM token vector used to train and interact with an LLM.

Now imagine that other aspects of being human (sensory input, emotional input, physical body sensation, gut feelings, etc.) could be added as metadata to the the token stream, along with some kind of attention function that amplified or diminished the importance of those at any given time period -- all still represented as a stream of tokens.

If an LLM could be trained on input that was enriched by all of the above kind of data, then quite likely the output would feel much more human than the responses we get from LLMs.

Humans are moody, we get headaches, we feel drawn to or repulsed by others, we brood and ruminate at times, we find ourselves wanting to impress some people, some topics make us feel alive while others make us feel bored.

Human intelligence is always colored by the human experience of obtaining it. Obviously we don't obtain it by getting trained on terabytes of data all at once disconnected from bodily experience.

Seemingly we could simulate a "body" and provide that as real time token metadata for an LLM to incorporate, and we might get more moodiness, nostalgia, ambition, etc.

Asking for a theory of mind is in fact committing the Cartesian error of making a mind/body distinction. What is missing with LLMs is a theory of mindbody... similarity to spacetime is not accidental as humans often fail to unify concepts at first.

LLMs are simply time series predictors that can handle massive numbers of parameters in a way that allows them to generate corresponding sequences of tokens that (when mapped back into words) we judge as humanlike or intelligence-like, but those are simply patterns of logic that come from word order, which is closely related in human languages to semantics.

It's silly to think that we humans are not abstractly representable as a probabilistic time series prediction of information. What isn't?

calf|2 years ago

So my observation is that we could embody an AI so that it learns theory of mind-body--but then we could remove the body. This gives a theory of mindful entity that does not need a body to exist.

Then the next research step could be to study those properties so as to reconstruct/reproduce a theory of mind-body AI, without needing any embodiment process at all to obtain it. Is that, in principle, possible? It is unclear me.

theptip|2 years ago

This is a terrible eval. Do not update your beliefs on whether LLMs have Theory of Mind based on this paper.

The eval is a weird, noisy visual task (picture of astronaut with “care packages”). Their results are hopelessly narrow.

A better eval is to use actual scientifically tested psychology test on text (the native and strongest domain for LLMs), for example the sort of scenarios used to gauge when children develop theory of mind (“Alice puts her keys on the table then leaves the room. Bob moves the keys to the drawer. Alice returns. Where does she think the keys are?”) which GPT-4 can handle easily; it is very clear from this that GPT has a theory of mind.

A negative result doesn’t disprove capabilities; it could easily show your eval is garbage. Showing a robust positive capability is a more robust result.

tivert|2 years ago

> A better eval is to use actual scientifically tested psychology test on text (the native and strongest domain for LLMs), for example the sort of scenarios used to gauge when children develop theory of mind (“Alice puts her keys on the table then leaves the room. Bob moves the keys to the drawer. Alice returns. Where does she think the keys are?”) which GPT-4 can handle easily; it is very clear from this that GPT has a theory of mind.

Aren't you confusing having a theory of mind with being able to output the right answer to a test? Isn't your proposed evaluation especially problematic because an "actual scientifically tested psychology test" is likely in the training data along with a lot of discussion and analysis of that test and the correct and incorrect answers that can be given?

elicksaur|2 years ago

Or there are enough of those examples in the training set that it can guess well. Not sure how such an example would prove anything when we know an LLM is just guessing the best words.

Nothing I’ve seen shows evidence of any sort of abstract concepts in there.

stuckinhell|2 years ago

Do humans have that as well ? I read studies that suggest we make up consciousness a half second after something happened.

omginternets|2 years ago

We don't "make up" consciousness, but yes, there is a processing latency of around 250-300ms.

stcredzero|2 years ago

Bad liars seem to have difficulty with theory of mind. Sometimes ChatGPT comes across somewhat like this.

JonChesterfield|2 years ago

The fun question is whether human cognition similarly lacks deep insights or said theory of mind.

I perceive a moving of the goalposts as machine intelligence improves. Once we'd have been happy with smarter than an especially stupid person, now I think we're aiming at smarter than the smartest person.

swatcoder|2 years ago

> I perceive a moving of the goalposts as machine intelligence improves.

Goal posts only exist in games.

These systems are engineering products to be leveraged in enginenering processes. We want to understand what they're good at and what they're bad at, and what potential they show for further refinement. There are no goal posts or "happy with" criteria in that context, and when we find ourselves adjusting the language we use to describe them because of how we see them work, we're trying to refine our ability to express their capabilities and suitabilities.

Intelligence, in particular, is a very poor and ambiguous word to be stuck using in technical contexts and so we're likely to just gradually shed it over time to reduce confusion as we hone in on better ways to talk about these systems. We've repeatedly done the same for earlier advances in the field, and for the same reason.

stcredzero|2 years ago

I perceive a moving of the goalposts as machine intelligence improves.

We get a better and better idea of what this hazy term "intelligence" means as we DIY tinker with making our own new ones.

Once we'd have been happy with smarter than an especially stupid person, now I think we're aiming at smarter than the smartest person.

We're going to get there sooner than we think. When we get there, we will have new things to regret in ways we'd never thought of before.

mcguire|2 years ago

I believe those goalposts have always been way farther out than many people think. If you look at the discussion around Turing's original Imitation Game paper, you'll find people wanting the machine to be able to do things that most humans cannot. And its perfectly valid to do so.

If you regard "an especially stupid person" as someone with significant cognitive or communication limits, then Parry and Eliza's Doctor are pretty fair simulations of paranoid schizophrenia (as it was understood at the time) and Rogerian therapy. Likewise, chess and go AIs are pretty damn smart, except they can't do anything else.

The point is that, if you accept limits on what the machine needs to do, then "intelligence" as defined by behavior you can recognize becomes trivially and meaninglessly easy.

(It's sort of like evaluating a person's competence: a minority person has to be more competent than their cohort because non-minority people get the benefit of the doubt.)

vacuity|2 years ago

I think it has to do with the notion that many (most?) people who could hone, and employ, respectable cognitive skills neglect or refuse to do so in favor of putting down other species and LLMs. They point to the human exemplars and think having the same DNA template elevates them to that level. They have to be superior even if it means applying ridiculous biases around intelligence.

Animats|2 years ago

Not yet, no. The real question is whether a bigger version of the current technology will have deeper insights. That question should be answered within the next year, with the amount of money and GPU hardware being thrown at the problem.

ehsanu1|2 years ago

Has the title of the paper changed from what it was initially? It says "Have we built machines that think like people?" now, whereas the HN title is "Large language models lack deep insights or a theory of mind".

mdp2021|2 years ago

> A chief goal of artificial intelligence [would be] to build machines that think like people

"A chief goal of levers (cranes, etc.) engineering would be to build devices that lift like people"

educaysean|2 years ago

Well we are the most intelligent species known to us as of now. Of course it would be considered the holy grail of simulated intelligence.

natch|2 years ago

“vision-based” large language models.

Odd restriction. Why not investigate text-based ones?

Or is “vision-based” a technical term that encompasses models that were trained on text?

rf15|2 years ago

I work in the field. It's just not how text-token-based autoregressive models can ever work. I can't talk about my work of course, but even a quick glance on Wikipedia can tell you they'd need to be at least a symbolic hybrid, which is not being pursued(?) by the big players at the time.

aaroninsf|2 years ago

It is refreshing that the author's language expresses their findings as indicative of domains for attention and presumed improvement, rather than (as so is often the case, per Ximm's Law) making pronouncements which preclude such improvement!

bimguy|2 years ago

"Large language models lack deep insights or a theory of mind"

Funnily enough, this statement also applies to people that are scared of AI.

Maybe a bit off topic but does anyone else have that friend who sends them fear mongering AI videos with captions like "shocking AI" that are blatantly unimpressive or completely fake?

What is the best way to subdue this kind of fear in a friend, sending them written articles from high level researchers like Brooks does not work.

gumballindie|2 years ago

I dont know what’s worse. The fact that there are people who believe procedural text generators have insights and a theory of mind or the fact that we are taking them seriously and we need to publish papers to disprove their insanity.

huijzer|2 years ago

EDIT: Nevermind

AlecSchueler|2 years ago

Why does it have to remain the case or "age well" to be valid? They're studying the situation today.

verytrivial|2 years ago

I was having a drunken discussion with the philosophy lecturer a few weeks back. He was making a very similar point. I kept saying it does it really matter? Lacking a theory of mind and deep insights describes 90% of all perfectly normal people. And perhaps training will be able to "fake it" (he went off on bold tangents about the definitions of this and that), or the language model will be an adjunct to some other model which does have these insights encoded or deducible, much like the human mind does. He wasn't convinced and I was too drunk. But it was basically feeling like: You can't feed carrots to a car like you can a horse, therefore cars are worthless.

dweinus|2 years ago

Of course they don't! But I think the most fascinating and exciting part about LLMs is: a sufficiently large model can produce things that look a lot like cognition, without having it at all. That is shocking and suggests maybe AGI is not even goal worth hitting.

bloppe|2 years ago

To get into this analogy: this doesn't mean cars are worthless; it just means they're a poor approximation of a horse. Maybe you don't want to approximate a horse. But, if you do want to approximate a horse, don't try to do it with a car.

Similarly, if you want to approximate a human, an LLM may be the best we can do right now, but it's hardly a good approximation.