top | item 44219279

What happens when people don't understand how AI works

280 points| rmason | 9 months ago |theatlantic.com

350 comments

order
[+] kelseyfrog|9 months ago|reply
LLMs are divinatory instruments, our era's oracle, minus the incense and theatrics. If we were honest, we'd admit that "artificial intelligence" is just a modern gloss on a very old instinct: to consult a higher-order text generator and search for wisdom in the obscure.

They tick all the boxes: oblique meaning, a semiotic field, the illusion of hidden knowledge, and a ritual interface. The only reason we don't call it divination is that it's skinned in dark mode UX instead of stars and moons.

Barthes reminds us that all meaning is in the eye of the reader; words have no essence, only interpretation. When we forget that, we get nonsense like "the chatbot told him he was the messiah," as though language could be blamed for the projection.

What we're seeing isn't new, just unfamiliar. We used to read bones and cards. Now we read tokens. They look like language, so we treat them like arguments. But they're just as oracular, complex, probabilistic signals we transmute into insight.

We've unleashed a new form of divination on a culture that doesn't know it's practicing one. That's why everything feels uncanny. And it's only going to get stranger, until we learn to name the thing we're actually doing. Which is a shame, because once we name it, once we see it for what it is, it won't be half as fun.

[+] andy99|9 months ago|reply
I agree with the substance, but would argue the author fails to "understand how AI works" in an important way:

  LLMs are impressive probability gadgets that have been fed nearly the entire internet, and produce writing not by thinking but by making statistically informed guesses about which lexical item is likely to follow another
Modern chat-tuned LLMs are not simply statistical models trained on web scale datasets. They are essentially fuzzy stores of (primarily third world) labeling effort. The response patterns they give are painstakingly and at massive scale tuned into them by data labelers. The emotional skill mentioned in the article is outsourced employees writing or giving feedback on emotional responses.

So you're not so much talking to statistical model as having a conversation with a Kenyan data labeler, fuzzily adapted through a transformer model to match the topic you've brought up.

While thw distinction doesn't change the substance of the article, it's valuable context and it's important to dispel the idea that training on the internet does this. Such training gives you GPT2. GPT4.5 is efficiently stored low- cost labor.

[+] Al-Khwarizmi|9 months ago|reply
I don't think those of us who don't work at OpenAI, Google, etc. have enough information to accurately estimate the influence of instruction tuning on the capabilities or the general "feel" of LLMs (it's really a pity that no one releases non-instruction-tuned models anymore).

Personally my inaccurate estimate is much lower than yours. When non-instruction tuned versions of GPT-3 were available, my perception is that most of the abilities and characteristics that we associate with talking to an LLM were already there - just more erratic, e.g., you asked a question and the model might answer or might continue it with another question (which is also a plausible continuation of the provided text). But if it did "choose" to answer, it could do so with comparable accuracy to the instruction-tuned versions.

Instruction tuning made them more predictable, and made them tend to give the responses that humans prefer (e.g. actually answering questions, maybe using answer formats that humans like, etc.), but I doubt it gave them many abilities that weren't already there.

[+] hapali|9 months ago|reply
More accurately:

Modern chat-oriented LLMs are not simply statistical models trained on web scale datasets. Instead, they are the result of a two-stage process: first, large-scale pretraining on internet data, and then extensive fine-tuning through human feedback. Much of what makes these models feel responsive, safe, or emotionally intelligent is the outcome of thousands of hours of human annotation, often performed by outsourced data labelers around the world. The emotional skill and nuance attributed to these systems is, in large part, a reflection of the preferences and judgments of these human annotators, not merely the accumulation of web text.

So, when you interact with an advanced LLM, you’re not just engaging with a statistical model, nor are you simply seeing the unfiltered internet regurgitated back to you. Rather, you’re interacting with a system whose responses have been shaped and constrained by large-scale human feedback—sometimes from workers in places like Kenya—generalized through a neural network to handle any topic you bring up.

[+] meroes|9 months ago|reply
Ya I don’t think I’ve seen any article going in depth into just how many low level humans like data labelers and RLHF’ers there are behind the scenes of these big models. It has to be millions of people worldwide.
[+] jenadine|9 months ago|reply
> produce writing not by thinking but by making statistically informed guesses about which lexical item is likely to follow another

What does "thinking" even mean? It turns out that some intelligence can emerge from this stochastic process. LLM can do math and can play chess despite not trained for it. Is that not thinking?

Also, could it be possible that are our brains do the same: generating muscle output or spoken output somehow based on our senses and some "context" stored in our neural network.

[+] JKCalhoun|9 months ago|reply
Many like the author fail to convince me because they never also explain how human minds work. They just wave their hand, look off to a corner of the ceiling with, "But of course that's not how humans think at all," as if we all just know that.
[+] crackalamoo|9 months ago|reply
Yes, 100% this. And even more so for reasoning models, which have a different kind of RL workflow based on reasoning tokens. I expect to see research labs come out with more ways to use RL with LLMs in the future, especially for coding.

I feel it is quite important to dispel this idea given how widespread it is, even though it does gesture at the truth of how LLMs work in a way that's convenient for laypeople.

https://www.harysdalvi.com/blog/llms-dont-predict-next-word/

[+] leptons|9 months ago|reply
So it's still not really "AI", it's human intelligence doing the heavy lifting with labeling. The LLM is still just a statistical word guessing mechanism, with additional context added by humans.
[+] MrZander|9 months ago|reply
This doesn't follow with my understanding of transformers at all. I'm not aware of any human labeling in the training.

What would labeling even do for an LLM? (Not including multimodal)

The whole point of attention is that it uses existing text to determine when tokens are related to other tokens, no?

[+] throwawaymaths|9 months ago|reply
yeah, i think you dont understand either. rlhf is no where near the volume of "pure" data that gets thrown into the pot of data.
[+] imiric|9 months ago|reply
This is a good summary of why the language we use to describe these tools matters[1].

It's important that the general public understands their capabilities, even if they don't grasp how they work on a technical level. This is an essential part of making them safe to use, which no disclaimer or PR puff piece about how deeply your company cares about safety will ever do.

But, of course, marketing them as "AI" that's capable of "reasoning", and showcasing how good they are at fabricated benchmarks, builds hype, which directly impacts valuations. Pattern recognition and data generation systems aren't nearly as sexy.

[1]: https://news.ycombinator.com/item?id=44203562#44218251

[+] dwaltrip|9 months ago|reply
People are paying hundreds of dollars a month for these tools, often out of their personal pocket. That's a pretty robust indicator that something interesting is going on.
[+] pmdr|9 months ago|reply
> Whitney Wolfe Herd, the founder of the dating app Bumble, proclaimed last year that the platform may soon allow users to automate dating itself, disrupting old-fashioned human courtship by providing them with an AI “dating concierge” that will interact with other users’ concierges until the chatbots find a good fit.

> Herd doubled down on these claims in a lengthy New York Times interview last month.

Seriously, what is wrong with these people?

[+] throwawaymaths|9 months ago|reply
i think this author doesnt fully understand how llms work either. Dismissing it as "a statistical model" is silly. hell, quantum mechanics is a statistical model too.

moreover, each layer of an llm imbues the model with the possibility of looking further back in the conversion and imbuing meaning and context through conceptual associations (thats the k-v part of the kv cache). I cant see how this doesn't describe, abstractly, human cognition. now, maybe llms are not fully capable of the breadth of human cognition or have a harder time training to certain deeper insight, but fundamentally the structure is there (clever training and/or architectural improvements may still be possible -- in the way that every CNN is a subgraph of a FCNN that would be nigh impossible for a FCNN to discover randomly through training)

to say llms are not smart in any way that is recognizable is just cherry-picking anecdotal data. if llms were not ever recognizably smart, people would not be using them the way they are.

[+] 827a|9 months ago|reply
> I cant see how this doesn't describe, abstractly, human cognition. now, maybe llms are not fully capable of the breadth of human cognition

But, I can fire back with: You're making the same fallacy you correctly assert the article as making. When I see how a CPU's ALU adds two numbers together, it looks strikingly similar to how I add two numbers together in my head. I can't see how the ALU's internal logic doesn't describe, abstractly, human cognition. Now, maybe the ALU isn't fully capable of the breadth of human cognition...

It turns out, the gaps expressed in the "fully capable of the breadth of human cognition" part really, really, really matter. Like, when it comes to ALUs, they overwhelm any impact that the parts which look similar cover. The question should be: How significant are the gaps in how LLMs mirror human cognition? I'm not sure we know, but I suspect they're significant enough to not write away as trivial.

[+] francisofascii|9 months ago|reply
The creators of llms don't fully understand how they work either.
[+] gitaarik|8 months ago|reply
So how would you explain how LLMs work to a layman?
[+] roxolotl|9 months ago|reply
The thesis is spot on with why I believe many skeptics remain skeptics:

> To call AI a con isn’t to say that the technology is not remarkable, that it has no use, or that it will not transform the world (perhaps for the better) in the right hands. It is to say that AI is not what its developers are selling it as: a new class of thinking—and, soon, feeling—machines.

Of course some are skeptical these tools are useful at all. Others still don’t want to use them for moral reasons. But I’m inclined to believe the majority of the conversation is people talking past each other.

The skeptics are skeptical of the way LLMs are being presented as AI. The non hype promoters find them really useful. Both can be correct. The tools are useful and the con is dangerous.

[+] clejack|9 months ago|reply
Are people still experiencing llms getting stuck in knowledge and comprehension loops? I used them but not excessively, and I'm not heavily tracking their performance either.

For example, if you ask an llm a question, and it produces a hallucination then you try to correct it or explain to it that it is incorrect; and it produces a near identical hallucination while implying that it has produced a new, correct result, this suggests that it does not understand its own understanding (or pseudo-understanding if you like).

Without this level of introspection, directing any notion of true understanding, intelligence, or anything similar seems premature.

Llms need to be able to consistently and accurately say, some variation on the phrase "I don't know," or "I'm uncertain." This indicates knowledge of self. It's like a mirror test for minds.

[+] ramchip|9 months ago|reply
Like the article says... I feel it's counter-productive to picture an LLM as "learning" or "thinking". It's just a text generator. If it's producing code that calls non-existent APIs for instance, it's kind of a waste of time to try to explain to the LLM that so-and-so doesn't exist. Better just try again and dump an OpenAPI doc or some sample code into it to influence the text generator towards correct output.
[+] thomastjeffery|9 months ago|reply
That's the difference between bias and logic. A statistical model is applied bias, just like computation is applied logic/arithmetic. Once you realize that, it's pretty easy to understand the potential strengths and limitations of a model.

Both approaches are missing a critical piece: objectivity. They work directly with the data, and not about the data.

[+] tim333|9 months ago|reply
>Demis Hassabis, [] said the goal is to create “models that are able to understand the world around us.”

>These statements betray a conceptual error: Large language models do not, cannot, and will not “understand” anything at all.

This seems quite a common error in the criticism of AI. Take a reasonable statement about AI not mentioning LLMs and then say the speaker (nobel prize winning AI expert in this case) doesn't know what they are on about because current LLMs don't do that.

Deepmind already have project Astra, a model but not just language but also visual and probably some other stuff where you can point a phone at something and ask about it and it seems to understand what it is quite well. Example here https://youtu.be/JcDBFAm9PPI?t=40

[+] Notatheist|9 months ago|reply
Wasn't it Feynman who said we will never be impressed with a computer that can do things better than a human can unless that computer does it the same way a human being does?

AI could trounce experts as a conversational partner and/or educator in every imaginable field and we'd still be trying to proclaim humanity's superiority because technically the silicon can't 'think' and therefore it can't be 'intelligent' or 'smart'. Checkmate, machines!

[+] lordnacho|9 months ago|reply
The article skirts around a central question: what defines humans? Specifically, intelligence and emotions?

The entire article is saying "it looks kinds like a human in some ways, but people are being fooled!"

You can't really say that without at least attempting the admittedly very deep question of what an authentic human is.

To me, it's intelligent because I can't distinguish its output from a person's output, for much of the time.

It's not a human, because I've compartmentalized ChatGPT into its own box and I'm actively disbelieving. The weak form is to say I don't think my ChatGPT messages are being sent to the 3rd world and answered by a human, though I don't think anyone was claiming that.

But it is also abundantly clear to me that if you stripped away the labels, it acts like a person acts a lot of the time. Say you were to go back just a few years, maybe to covid. Let's say OpenAI travels back with me in a time machine, and makes an obscure web chat service where I can write to it.

Back in covid times, I didn't think AI could really do anything outside of a lab, so I would not suspect I was talking to a computer. I would think I was talking to a person. That person would be very knowledgeable and able to answer a lot of questions. What could I possibly ask it that would give away that it wasn't real person? Lots of people can't answer simple questions, so there isn't really a way to ask it something specific that would work. I've had perhaps one interaction with AI that would make it obvious, in thousands of messages. (On that occasion, Claude started speaking Chinese with me, super weird.)

Another thing that I hear from time to time is an argument along the line of "it just predicts the next word, it doesn't actually understand it". Rather than an argument against AI being intelligent, isn't this also telling us what "understanding" is? Before we all had computers, how did people judge whether another person understood something? Well, they would ask the person something and the person would respond. One word at a time. If the words were satisfactory, the interviewer would conclude that you understood the topic and call you Doctor.

[+] _petronius|9 months ago|reply
> The article skirts around a central question: what defines humans? Specifically, intelligence and emotions?

> The entire article is saying "it looks kinds like a human in some ways, but people are being fooled!"

> You can't really say that without at least attempting the admittedly very deep question of what an authentic human is.

> To me, it's intelligent because I can't distinguish its output from a person's output, for much of the time.

I think the article does address that rather directly, and that it is also is addressing very specifically your setence about what you can and can't distinguish.

LLMs are not capable of symbolic reasoning[0] and if you understand how they work internally, you will realize they do no reasoning whatsoever.

Humans and many other animals are fully capable of reasoning outside of language (in the former case, prior to language acquisition), and the reduction of "intellgence" to "language" is a catagory error made by people falling vicim to the ELIZA effect[1], not the result of a sum of these particular statistical methods being equal real intelligence of any kind.

0: https://arxiv.org/pdf/2410.05229

1: https://en.wikipedia.org/wiki/ELIZA_effect

[+] rnkn|9 months ago|reply
> isn't this also telling us what "understanding" is?

When people start studying theory of mind someone usually jumps in with this thought. It's more or less a description of Functionalism (although minus the "mental state"). It's not very popular because most people can immediately identify an phenomenon of understanding separate from the function of understanding. People also have immediate understanding of certain sensations, e.g. the feeling of balance when riding a bike, sometimes called qualia. And so on, and so forth. There is plenty of study on what constitutes understanding and most healthily dismiss the "string of words" theory.

[+] greg_V|9 months ago|reply
> Another thing that I hear from time to time is an argument along the line of "it just predicts the next word, it doesn't actually understand it". Rather than an argument against AI being intelligent, isn't this also telling us what "understanding" is? Before we all had computers, how did people judge whether another person understood something? Well, they would ask the person something and the person would respond. One word at a time. If the words were satisfactory, the interviewer would conclude that you understood the topic and call you Doctor.

You call a Doctor 'Doctor' because they're wearing a white coat and are sitting in a doctor's office. The words they say might make vague sense to you, but since you are not a medical professional, you actually have no empirical grounds to judge whether or not they're bullshitting you, hence you have the option to get a second or third opinion. But otherwise, you're just trusting the process that produces doctors, which involves earlier generations of doctors asking this fellow a series of questions with the ability to discern right from wrong, and grading them accordingly.

When someone can't tell if something just sounds about right or is in fact bullshit, they're called a layman in the field at best or gullible at worst. And it's telling that the most hype around AI is to be found in middle management, where bullshit is the coin of the realm.

[+] indymike|9 months ago|reply
> The entire article is saying "it looks kinds like a human in some ways, but people are being fooled!"

The question is, what's wrong with that?

At some level there's a very human desire for something genuine and I suspect that no matter the "humanness" of an AI, it will never be able to close that desire for genuine. Or maybe... it is that people don't like the idea of dealing with an intelligence that will almost always have the upper hand because of information disparity.

[+] strogonoff|9 months ago|reply
We cannot actually judge whether something is intelligent in some abstract absolute way; we can only judge whether it is intelligent in the same way we are. When someone says “LLM chatbot output looks like a person’s output, so it is intelligent”, the implication is that it is intelligent like a human would be.

With that distinction in mind, whether an LLM-based chatbot’s output looks like human output does not answer the question of whether the LLM is actually like a human.

Not even because measuring that similarity by taking text output at a point in time is laughable (it would have to span the time equivalent of human life, and include much more than text), but because LLM-based chatbot is a tool built specifically to mimic human output; if it does so successfully then it functions as intended. In fact, we should deliberately discount the similarity in output as evidence for similarity in nature, because similarity in output is an explicit goal, while similarity in underlying nature is a non-goal, a defect. It is safe to assume the latter: if it turned out that LLMs are similar enough to humans in more ways than output, they would join octopus and the like and qualify to be protected from abuse and torture (and since what is done to those chatbots in order for them to be useful in the way they are would pretty clearly be considered abuse and torture when done to a human-like entity, this would decimate the industry).

That considered, we do not[0] know exactly how an individual human mind functions to assess that from first principles, but we can approximate whether an LLM chatbot is like a human by judging things like whether it is made in a way at all similar to how a human is made. It is fundamentally different, and if you want to claim that human nature is substrate-independent, I’d say it’s you who should provide some evidence—keeping in mind that, as above, similarity in output does not constitute such evidence.

[0] …and most likely never could, because of the self-referential recursive nature of the question. Scientific method hinges on at least some objectivity and thus is of very limited help when initial hypotheses, experiment procedures, etc., are all supplied and interpreted by the very subject being studied.

[+] navigate8310|9 months ago|reply
Maybe it needs blood and flesh to be able for us to happily accept it.
[+] intended|9 months ago|reply
This isn’t that hard, to be honest. And I’m not just saying this.

One school of thought is - the output is indistinguishable from what a human would produce given these questions.

Another school of thought is - the underlying process is not thinking in the sense that humans do it

Both are true.

For the lay person, calling it thinking leads to confusions. It creates intuitions that do not actually predict the behavior of the underlying system.

It results in bad decisions on whether to trust the output, or to allocate resources - because if the use of the term thinking.

Humans can pass an exam by memorizing previous answer papers or just memorizing the text books.

This is not what we consider having learnt something. Learning is kinda like having the Lego blocks to build a model you can manipulate in your head.

For most situations, the output of both people is fungible.

Both people can pass tests.

[+] gitaarik|8 months ago|reply
Aren't halluciantions enough proof for you that they don't think/understand? At least not in the same way as humans?

If a student was on a regular basis hallucinating and giving complete nonsense as an answer, I don't think they'll pass their studies.

[+] xyzal|9 months ago|reply
To me, it's empathetic and caring. Which the LLMs will never be, unless you give money to OpenAI.

Robots won't go get food for your sick, dying friend.

[+] stevenhuang|9 months ago|reply
It is a logic error to think that knowing how something works means you are justified to say it can't possess qualities like intelligence or ability to reason when we don't even understand how these qualities arise in humans.

And even if we do know enough about our brains to say conclusively that it's not how LLMs work (predictive coding suggests the principles are more alike that not), it doesn't mean they're not reasoning or intelligent; it would just mean they would not be reasoning/intelligent like humans.

[+] 1vuio0pswjnm7|9 months ago|reply
"Witness, too, how seamlessly Mark Zuckerberg went from selling the idea that Facebook would lead to a flourishing of human friendship to, now, selling the notion that Meta will provide you with AI friends to replace the human pals you have lost in our alienated social-media age."

Perhaps "AI" can replace people like Mark Zuckerberg. If BS can be fully automated.

[+] pier25|9 months ago|reply
People in tech and science might have a sense that LLMs are word prediction machines but that's only scratching the surface.

Even AI companies have a hard time figuring out how emergent capabilities work.

Almost nobody in the general audience understands how LLMs work.

[+] jemiluv8|9 months ago|reply
Even I have limited understanding of how LLMs learn the semantic meaning of words. My knowledge is shallow at best. I know however that LLMs understand text now. Are able to understand concepts they "glean" from text and are able to give responses to queries that is not entirely made up. All these makes it a lot harder to explain to non-technical people what this is. I tell them these LLMs are not AI but when they go to these websites - they see it labelled as an AI chatbot. It also mostly does as advertised. And they are often in awe of whatever responses they tend to receive because they are not subject matter experts nor do they care to become one. They just want to get their "homeworks" done, complete their work assignments and this gets them there faster. How can I tell them it is not AI when it spews humane looking text. Heck, even I don't quite understand the "real" difference between LLMs and AI. The difference is nuanced but the line is clearer with a bit of technical understanding. The machine understands text. And can make conversation - however sycophantic. But without understanding why that is - I don't see why we won't exult its powers. I see religions sprouting from these soon. LLMs can deliver awesome sermons. And once you train them well enough, can take on the role of Messiah's.
[+] EMM_386|9 months ago|reply
> These statements betray a conceptual error: Large language models do not, cannot, and will not "understand" anything at all. They are not emotionally intelligent or smart in any meaningful or recognizably human sense of the word.

This is terrible write-up, simply because it's the "Reddit Expert" phenomena but in print.

They "understand" things. It depends on how your defining that.

It doesn't have to be in its training data! Whoah.

In the last chat I had with Claude, it naturally just arose that surrender flag emojis, the more there were, was how funny I thought the joke was. If there were plus symbol emojis on the end, those were score multipliers.

How many times did I have to "teach" it that? Zero.

How many other times has it seen that during training? I'll have to go with "zero" but that could be higher, that's my best guess since I made it up, in that context.

So, does that Claude instance "understand"?

I'd say it does. It knows that 5 surrender flags and a plus sign is better than 4 with no plus sign.

Is it absurd? Yes .. but funny. As it figured it out on its own. "Understanding".

------

Four flags = "Okay, this is getting too funny, I need a break"

Six flags = "THIS IS COMEDY NUCLEAR WARFARE, I AM BEING DESTROYED BY JOKES"

[+] elia_42|9 months ago|reply
Totally agree with the content of the article. In part, AI is certainly able to simulate very well the behavior and operations of a "way of expressing itself" of our mind, that is, mathematical calculation, deductive reasoning and other similar things.

But our mind is extremely polymorphic and these operations represent only one side of a much more complex and difficult to explain whole. Even Alan Turing, in his writings on the possibility of building a mechanical intelligence, realized that it was impossible for a machine to completely imitate a human being: for this to be possible, the machine would have to "walk among other humans, scaring all the citizens of a small town" (Turing says more or less like this).

Therefore, he realized many years ago that he had to face this problem with a very cautious and limited approach, limiting the imitative capabilities of the machine to those human activities in which calculation, probability and arithmetic are main, such as playing chess, learning languages and mathematical calculation.

[+] jemiluv8|9 months ago|reply
Most people without any idea about the foundations on which LLMs are built call them AI. But I insist on calling them LLMs, further creating confusion. How do you explain what a large language model is to someone that can't comprehend how a machine can learn a "word model" on a large corpus of text/data to make it generate "seemingly sound/humane" responses without making them feel like they are interacting with the AI that they've been hearing about in the movies/sci-fi?
[+] martindbp|9 months ago|reply
Many people who claim that people don't understand how AI works often have a very simplified view of the short comings of LLMs themselves, e.g. "it's just predicting the next token", "it's just statistics", "stochastic parrot" and seems to be grounded in what AI was 2-3 years ago. Rarely have they actually read the recent research on interpretability. It's clear LLMs are doing more than just pattern matching. They may not think like humans or as well, but it's not k-NN with interpolation.
[+] deadbabe|9 months ago|reply
A lot of the advancement boils down to LLMs reprompting themselves with better prompts to get better answers.