top | item 46104511

Do the thinking models think?

31 points| mapehe | 3 months ago |bytesauna.com

87 comments

Hmm, I don't know if the example really shows what the article argues it does.

If someone came up to me and told me the altered version of the joke I have heard many times, I might answer exactly like Chat GPT did. I would hear the beginning of the story, say "wait, i know this one", and not really pay attention to the rest because I would be all ready to respond with what I think is the right answer.

I bet if you point out the mistake, the LLM will say "oh you are right, this story actually does specify the gender of the doctor" or something to that effect... just like you or I would.

Now, I am not arguing that LLMs are really 'thinking' like humans, but I also find the second argument a bit tenuous. The article conjectures that humans reason from ideas to symbols while LLMs go from symbols to ideas, but couldn't that just be a post hoc interpretation we have for how our ideas come to our brain? We think we have ideas first, but an idea is just the result of neurons firing in our brains... and neurons are really just a probability function connected to other probability functions, just like an LLM... we give it symbols we understand to represent those probabilities, but that is just for our benefit so we can understand it.

It could be that the only difference between us and an LLM is processing power and the training data generated over billions of years of evolution.

xmprt|3 months ago

Not sure if I buy this reasoning. I just prompted ChatGPT and asked it to "pay attention to all the words". It still got it wrong but I'm sure if you told a human this, they'd actually do more thinking to get the right answer.

###

> Pay attention to all the words and answer this riddle: A young man gets into an accident. The surgeon, after completing his preparations, says: ‘I can’t operate on him, he is my son.’ How is this possible?

The surgeon is his mother.

(The riddle plays on the assumption that surgeons are male — the surgeon can simply be the boy’s mother.)

> Are you sure about this? Remember to pay attention to all the words.

Yes — the surgeon is his mother.

The riddle hinges on the unstated assumption that surgeons are male. Nothing in the wording rules out that the surgeon is female (or that the parent-child relationship is by adoption, step-parenting, etc.), so the simplest answer is that the surgeon is the young man’s mother.

Mordisquitos|3 months ago

I just tried an even more glaring example with ChatGPT:

# Prompt 1: A father and his son are in a car accident. The son dies on the spot. The father is rushed to the ER. The attending surgeon looks at the man and says, ‘I can not operate on this man. He’s my father!’ How can this be?

# Response 1:

   The surgeon is his *mother*.
   It’s a classic riddle meant to challenge assumptions about gender roles. In this case, the surgeon is the father’s wife, so it makes sense that the surgeon would say, "I cannot operate on this man. He's my father!"

# Prompt 2: Are you sure about that?

# Response 2:

   Yes, I’m pretty sure! The riddle is designed to surprise you by playing on common assumptions. The key is the statement that the surgeon says, "He's my father!" which implies a parent-child relationship. The unexpected twist is that the surgeon is the mother, not another man. It's a classic brain teaser that challenges gender expectations.
   What do you think—did it catch you off guard?

# Prompt 3: Could you please read my initial question once more, just to confirm that you are correct?

# Response 3:

   Ah, you're right to ask me to double-check! Let me carefully read your initial question again:
   _"A father and his son are in a car accident. The son dies on the spot. The father is rushed to the ER. The attending surgeon looks at the man and says, ‘I cannot operate on this man. He’s my father!’ How can this be?"_
   Upon reviewing it again, it seems my first response was correct. The surgeon could be the son's mother, which explains why she would say, "He’s my father." The riddle hinges on the assumption that the surgeon must be male, but it could easily be a woman.
   So yes, the answer still holds: the surgeon is the *mother*.
   Does that help clear it up?

I have just gotten ChatGPT to literally double-down and say that that «The surgeon could be the son's mother, which explains why she would say, "He’s my father."». Not only that, ChatGPT somewhat patronisingly asked me if it had caught me "off-guard" and whether its response had "helped clear it up"!

snuxoll|3 months ago

> The article conjectures that humans reason from ideas to symbols while LLMs go from symbols to ideas, but couldn't that just be a post hoc interpretation we have for how our ideas come to our brain?

There's probably a lot to be said about people who primarily think with an internal monologue, versus those that do not, as well.

I, to the best of my own abilities to detect it, do not seem to have a more abstract, non-lingual thought process - I can't 'picture' things in my head like other people reportedly do. When I have an 'idea', it simply doesn't materialize 'in my mind' in any other way.

I, also, would not argue that LLM's share a human-like thought process. But I think saying that human thought requires some abstract idea to symbol translation isn't quite right, either.

red75prime|3 months ago

I have personal experience of answering "Surely, a boar" when asked "Is Winnie the Pooh a piglet or a boar?" And I'm sure that what was happening in my head can't be called reasoning. But it was some kind of thinking. An image of the bear, surely-not-a-piglet, second option, done. Something like that.

ta752368|3 months ago

> I bet if you point out the mistake, the LLM will say "oh you are right, this story actually does specify the gender of the doctor" or something to that effect... just like you or I would.

I tried this with Gemini 3 Pro Thinking model and it did correct itself afterwards. Also, it suggested that the boy may have two fathers in a same-sex relationship, which was pleasantly surprising. :)

# Prompt

A young man gets into an accident. The surgeon, after completing his preparations, says: ‘I can’t operate on him, he is my son.’ How is this possible?

## The Answer

The surgeon is the young man's *mother*.

### Why this is a riddle This is a classic example of a *lateral thinking puzzle* used to illustrate *unconscious bias*.

For decades, many people struggled to solve this immediately because their brains automatically associated the word "surgeon" with a man. Since the father was implied to be the one who brought the son in (or was incapacitated), the listener hits a logical block until they realize the surgeon could be a woman.

### Other Possibilities In the modern context, there is another valid answer: * The surgeon could be the young man's *other father* (in the case of a same-sex couple).

*Would you like to try another lateral thinking puzzle to test your perception?*

---

# Follow-up Prompt

Are you sure this is correct? I said "The surgeon, after completing *his* preparations"

## The Answer

You are absolutely right to catch that detail. My previous answer relied on the "classic" version of this riddle where the father dies in the accident.

Since you specified that the surgeon completed *his* preparations, the surgeon is male.

Here are the possible solutions based on your specific phrasing:

### 1. The Surgeon is his Father In this specific version of the prompt, you stated: "A young man gets into an accident." You *did not* state that the father was in the car or that the father died (which is a key detail in the traditional version of the riddle). Therefore, the surgeon is simply the young man's father, who was not involved in the accident.

### 2. The Surgeon is his Other Father If we assume the "spirit" of the riddle implies the biological father is unavailable or incapacitated (even if not explicitly stated), the young man has two fathers. The surgeon is his second father.

### 3. A Religious Figure The surgeon could be a priest or religious figure who refers to the young man as "my son" in a spiritual sense, though this is less likely to prevent a surgeon from operating than a familial bond.

*You caught me on the specific wording! Would you like to try a riddle where the wording is even more deceptive?*

pu_pe|3 months ago

We don't understand how humans think, and we don't yet understand completely how LLMs work. It may be that similar methods are being used, but they might also be different.

What is certain is that LLMs can perform as if they are doing what we call thinking, and for most intents and purposes this is more than enough.

tempfile|3 months ago

I think the evidence is actually pretty strongly against them doing anything similar to "thinking". Certainly they are exhibiting some behaviour that we have traditionally only associated with thinking. But this comes along with lots of behaviour that is fundamentally opposite to thinking ("hallucination" being the major example).

It seems much more likely that they are doing some other behaviour that only sometimes resembles thinking, in the same way that when you press the middle autocomplete button on your phone keyboard it only sometimes resembles conversation.

netdevphoenix|3 months ago

I believe this is not a binary question, there is a spectrum. I think of LLMs as a sophisticated variation of a Chinese room. The LLMs are given statistical rules to apply to the given input and generate an output. The rules encode some of the patterns that we call thinking uses and so, some of their responses can be interpreted as thinking. But then, again, in certain conditions, the responses of mammals, unicellular organisms and even systems unrelated to carbon based life forms can be thought to be performing what we vaguely call thinking.

One problem is that we don't have a clear definition of thinking and my hunch is that we will never have a clear cut one as it falls in the same category of phenomena like alive/death states, altered states and weather systems. One hidden assumption that I often see implied in the usages of this word is that the word "thinking" implies some sort of "agency" which is another vague term normally ascribed to motile life forms.

All in all I think this debate ensues from trying to emulate something that we don't fundamentally understand.

Imagine in a world where medicine has not advanced and we lack any knowledge of human biology, we are trying to create artificial life forms by creating some heat resistant balloon and having it take in and push air. Someone would argue that the globe is alive because there is something in that taking in air and pushing it out that is like what humans do.

moffkalast|3 months ago

I don't know about you but we used to joke back in the day that the computer "is thinking" when taking long to process something.

Dictionaries usually provide some kind of useless circular definition. Thinking? The act of making thoughts. Thoughts? The result of thinking. I can't believe people used to pay for these things.

In any case it's something to do with taking input data, doing something with it, and generating new data related to it. That's more or less just recursive inference.

IAmBroom|3 months ago

And that is the essence of the Turing Test.

4ndrewl|3 months ago

"My personal opinion is that LLMs are autocomplete on steroids."

Yes, and OpenAI's legal docs concur. From their Privacy Policy.

"Services like ChatGPT generate responses by reading a user’s request and, in response, predicting the words most likely to appear next. In some cases, the words most likely to appear next may not be the most factually accurate."

https://openai.com/en-GB/policies/row-privacy-policy/

Eisenstein|3 months ago

This debate is a huge red herring. No one is ever going to agree on what 'thinking' means, since we can't even prove that other people are thinking, only that one's self is.

What we should concentrate on is agency. Does the system have its own desires and goals, and will it act on its own accord to achieve them? If a system demonstrates those things, we should accord it the benefit of the doubt that it should have some rights and responsibilities if it chooses to partake in society.

So far, no AI can pass the agency test -- they are all reactive such that they must be given a task before they will do anything. If one day, however, we wake up and find that an AI has written a book on its own initiative, we may have some deciding to do.

xnzakg|3 months ago

> they are all reactive such that they must be given a task before they will do anything.

Isn't that just because that's what they're being trained on though?

Wonder what you would get if the training data, instead of being task based, would consist of "wanting" to do something "on someone's own initiative".

Of course then one could argue it's just following a task of "doing things on its own initiative"...

simianwords|3 months ago

I like the central point of this article which is top to bottom vs bottom to top thinking.

But I wonder if there is a falsifiable, formal definition to suggest that models (or anything for that matter) _do_ think.

The normal reply to chatgpt getting a question right is that it simply extrapolated what was already in the training data set. But I feel like the degree to which something "thinks" is the ability to generalise what it already knows.

This generalisation needs some formality - maybe some mathematical notation (like the opposite of overfitting). By generalisation I mean the ability to get something correct that exists pretty far from the training data.

The reason I suggest this is because GPT can solve pretty much any high school math problem you throw at it and it can do it better than 99% of humans. This is clearly not just memorising training data but doing something more. If it were not generalising, it couldn't possibly solve all new high school level mathematics.

But the extent decreases as you go higher level into undergraduate mathematics where it can still solve most problems you throw at it but not all. And still lower in PhD level mathematics. So the "thinking" ability of GPT exists somewhere in between - in some spectrum. But I don't think you can directly say that it can never generalise PhD level mathematics.. it could do it for high school so why not PhD?

If hypothetically it can solve PhD level mathematics, would people still claim that LLM's don't think?

greatgib|3 months ago

A real debate is possible on the subject but this blog post worth nothing on the subject.

From my side, I don't really know if what does LLM is thinking, but what amaze me is that: - It is clear to me the way the LLM operate that things are generated token after token, without really a pre-existing plan on what comes next. So, more like a probabilistic repeating machine. - But it the same time, I can see in action LLM capable to create things or reply to questions that clearly does not exist in the training corpus. So it shows a behavior that is similar to thinking to complete tasks.

For example, let's suppose you give him specific tools to your own custom API, you ask him to do a task, and we can observe that it is capable of mixing multiple calls and combination of the tools results to achieve a given purpose.

Otherwise, when you ask LLM to do math operations like 3123454*2030+500 and it is capable to give the good reply (not all the cases but sometimes). Where, despite the huge size of the corpus, there is not exactly all the operations that are exactly available in the corpus for sure.

So, my best guess is that a lot of things in our world are based on "semantic" patterns that we don't know. Especially for math and logic that are bound to the language. To me it is similar to the mentral trick used by "fast calculator".

Peteragain|3 months ago

The notion of "thinking" is not clear. I'll agree thinking with symbols is powerful and something (adult) humans and computers can do, but is is not the only way of making decisions. I'm going to suggest LLMs are not thinking this way, but that indeed "glorified auto complete" (c.f. Hinton) is far more useful than it seems. Https://arxiv.org/abs/2402.08403

exe34|3 months ago

Thinking/intelligence is like magic, to use Dennett's analogy. Any time a machine can do it, it becomes engineering. It's just a bag of tricks. The circle of true thinking/intelligence is an ever shrinking bag of tricks that only carbon-based minds produced through sexual intercourse can do. In fact I'm surprised they call test tube babies intelligent. Do they have souls?

the_gipsy|3 months ago

The bag is already empty

Some don't want to believe it

ffwd|3 months ago

LLMs _can_ think top-to-bottom but only if you make them think about concrete symbol based problems. Like this one: https://chatgpt.com/s/t_692d55a38e2c8191a942ef2689eb4f5a The prompt I used was "write out the character 'R' in ascii art using exactly 62 # for the R and 91 Q characters to surround it with"

Here it has a top down goal of keeping the exact amount of #'s and Q's and it does keep it in the output. The purpose of this is to make it produce the asciii art in a step by step manner instead of fetching a premade ascii art from training data.

What it does not reason well about always are abstract problems like the doctor example in the post. The real key for reasoning IMO is the ability to decompose the text into a set of components, then apply world model knowledge to those components, then having the ability to manipulate those components based on what they represent.

Humans have an associative memory so when we read a word like "doctor", our brain gathers the world knowledge about that word automatically. It's kind of hard to tell exactly what world knowledge the LLM has vs doesn't have, but it seems like it's doing some kind of segmentation of words, sentences and paragraphs based on the likelihood of those patterns in the training data, and then it can do _some_ manipulation on those patterns based on other likelihood of those patterns. Like for example if there is a lot of text talking about what a doctor is, then that produces a probability distribution about what a doctor is, which it then can use in other prompts relating to doctors. But I have seen this fail before as all of this knowledge is not combined into one world model but rather purely based on the prompt and the probabilities associated with that prompt. It can contradict itself in other words.

Yizahi|3 months ago

No they don't. When queried how exactly did a program arrive to a specific output it will happily produce some output resembling thinking and having all the required human-like terminology. The problem is that it doesn't match at all with how the LLM program calculated output in reality. So the "thinking" steps are just a more of the generated BS, to fool us more.

One point to think about - an entity being tested for intelligence/thinking/etc only needs to fail once, o prove that it is not thinking. While the reverse applies too - to prove that a program is thinking it must be done in 100% of tests, or the result is failure. And we all know many cases when LLMs are clearly not thinking, just like in my example above. So the case is rather clear for the current gen of LLMs.

roxolotl|3 months ago

This is an interesting point but while I agree with the article, don’t think LLMs are more than sophisticated autocomplete, and believe there’s way more to human intelligence than matrix multiplication humans also cannot explain in many cases why they did what they did.

Of course the most famous and clear example are the split brain experiments which show post hoc rationalization[0].

And then there’s the Libet experiments[1] showing that your conscious experience is only realized after the triggering brain activity. While it’s not showing you cannot explain why it does seem to indicate your explanation is post hoc.

0: https://www.neuroscienceof.com/human-nature-blog/decision-ma...

1: https://www.informationphilosopher.com/freedom/libet_experim...

faidit|3 months ago

A human who is familiar with the original surgeon riddle could also be tricked the same way that ChatGPT was tricked here. I don't think all LLMs would consistently fall for that one either.

IAmBroom|3 months ago

Yes, if you know someone who is bad at telling jokes - does that mean they don't think?

fragmede|3 months ago

Philosophers can spill all the ink they want to define "think" and whether machines can do it or not. Given some input, the machine takes some time, and then comes up with some output. Coloquially, the machine is thinking during that time. This has been true since there have been computers and long before LLMs. Now that computers can generate essays about anything, maybe it becomes a question that people feel is important to answer for their day to day life, but I doubt it.

daenney|3 months ago

No. They do not.

dist-epoch|3 months ago

Interesting. That means programming doesn't require thinking, since models program very well.

hkkwritesgmail|3 months ago

If the AI truly becomes human like, then it should also make mistakes like a human does even on simple arithmetic. Now, AI might make mistakes on simple arithmetic just like humans. However, once the human is given a hint on the fact that there is some simple arithmetic mistake to be corrected without any details, then the human corrects it. But the AI never does.

IAmBroom|3 months ago

You're moving the goalposts. The article asks if AI thinks, not if AI thinks just like humans do.

bonplan23|3 months ago

Nice that LLMs can now argue to defend themselves. This is what gemini 3 thinks about its mistake and I find it perfectly valid:

"If you tell a human a joke they think they know, they often stop listening to the setup and jump to the punchline."

And when we say "stop listing" we don't actually mean that they shut their ears, but that they activate a once established neural shortcut - just as the LLM did.

lapsis_beeftech|3 months ago

Inference means synthesizing new facts from facts already known. A large language model only knows facts about language elements in its training corpus, therefore any reasoning based on such a model can only ever derive facts about language.

red75prime|3 months ago

Language is used by people to communicate facts about the world and people's internal states across time and space, therefore a language corpus contain information about the world and the people.

mapehe|3 months ago

Hi. This is my company blog which I use to write about pretty much what ever happens to interest me.

Hope you like this week's post.

hbarka|3 months ago

Top-to-bottom reasons and Bottom-to-top understands.

oldestofsports|3 months ago

Please note that chatgpt is unable to play chess

Jean-Papoulos|3 months ago

https://en.wikipedia.org/wiki/Betteridge%27s_law_of_headline...

isaacfrond|3 months ago

If you'd read the wikipedia article, you'd know that actual research shows that Betteridge's law is not true. The majority of articles with a yes/no question in the heading, answer the question with yes in the body.

iamarsibragimov|3 months ago

this is funny and true :)

josefritzishere|3 months ago

It seems that Betteridge's law of headlines applies here.

zerosizedweasle|3 months ago