top | item 46185957

Bag of words, have mercy on us

328 points| ntnbr | 2 months ago |experimental-history.com

353 comments

order

Some comments were deferred for faster rendering.

bloaf|2 months ago

Everyone is out here acting like "predicting the next thing" is somehow fundamentally irrelevant to "human thinking" and it is simply not the case.

What does it mean to say that we humans act with intent? It means that we have some expectation or prediction about how our actions will effect the next thing, and choose our actions based on how much we like that effect. The ability to predict is fundamental to our ability to act intentionally.

So in my mind: even if you grant all the AI-naysayer's complaints about how LLMs aren't "actually" thinking, you can still believe that they will end up being a component in a system which actually "does" think.

RayVR|2 months ago

Are you a stream of words or are your words the “simplistic” projection of your abstract thoughts? I don’t at all discount the importance of language in so many things, but the question that matters is whether statistical models of language can ever “learn” abstract thought, or become part of a system which uses them as a tool.

My personal assessment is that LLMs can do neither.

MyOutfitIsVague|2 months ago

> Everyone is out here acting like "predicting the next thing" is somehow fundamentally irrelevant to "human thinking" and it is simply not the case.

Nobody is. What people are doing is claiming that "predicting the next thing" does not define the entirety of human thinking, and something that is ONLY predicting the next thing is not, fundamentally, thinking.

Libidinalecon|2 months ago

A motorcycle is not "sprinting" and an LLM is not "thinking". Everyone would agree that a motorcycle is not running but the same dumb shit is posted over and over and over on here that somehow the LLM is "thinking".

efitz|2 months ago

AI has made me question what it is to be a human.

I am not having some existential crisis, but if we get to a point where X% of humans cannot outperform “AI” on any task that humans deem “useful”, for some nontrivial value of X, then many assumptions that culture has inculcated into me about humanity are no longer valid.

What is the role of humans then?

Can it be said that humans “think” if they can’t think a thought that a non thinking AI cannot also think?

zkmon|2 months ago

It may be doing the "thinking" and could reach AGI. But we don't want it. We don't want to take a fork lift to the gym. We don't want plastic aliens showing off their AGI and asking humanity to outsource human thinking and decision-making to them.

perrygeo|2 months ago

Predicting the next token is not at all the same thing as predicting the next action in a causal chain of actions. Not even close. One is model of language tokens, the other is a model of the physical world. You can come up with all sorts of predictions that can't be expressed cleanly in natural language. And plenty of things that parse cleanly from a language perspective but are unhinged in their description of empirical reality.

voidhorse|2 months ago

When you have a thought, are you "predicting the next thing"—can you confidently classify all mental activity that you experience as "predicting the next thing"?

Language and society constrains the way we use words, but when you speak, are you "predicting"? Science allows human beings to predict various outcomes with varying degrees of success, but much of our experience of the world does not entail predicting things.

How confident are you that the abstractions "search" and "thinking" as applied to the neurological biological machine called the human brain, nervous system, and sensorium and the machine called an LLM are really equatable? On what do you base your confidence in their equivalence?

Does an equivalence of observable behavior imply an ontological equivalence? How does Heisenberg's famous principle complicate this when we consider the role observer's play in founding their own observations? How much of your confidence is based on biased notions rather than direct evidence?

The critics are right to raise these arguments. Companies with a tremendous amount of power are claiming these tools do more than they are actually capable of and they actively mislead consumers in this manner.

micromacrofoot|2 months ago

Yes, personally I'm completely fine with the fact that LLMs don't actually think. I don't care that they're not AGI, though the hysterics about "AGI is so close now" seems silly to me. Fusion reactors and self-driving cars are just around the same corner.

They prove to have some useful utility to me regardless.

gamerDude|2 months ago

I'm an LLMs are being used in workflows they don't make sense in-sayer. And while yes, I can believe that LLMs can be part of a system that actually does think, I believe that to achieve true "thinking", it would likely be a system that is more deterministic in its approach rather than probabilistic.

Especially when modeling acting with intent. The ability to measure against past results and think of new innovative approaches seems like it may come from a system that may model first and then use LLM output. Basically something that has a foundation of tools rather than an LLM using MCP. Perhaps using LLMs to generate a response that humans like to read, but not in them coming up with the answer.

Either way, yes, its possible for a thinking system to use LLMs (and potentially humans piece together sentences in a similar way), but its also possible LLMs will be cast aside and a new approach will be used to create an AGI.

So for me: even if you are an AI-yeasayer, you can still believe that they won't be a component in an AGI.

jampekka|2 months ago

A good heuristic is that if an argument resorts to "actually not doing <something complex sounding>" or "just doing <something simple sounding>" etc, it is not a rigorous argument.

bamboozled|2 months ago

The issue is that prediction is "part" of the human thought process, it's not the full story...

observationist|2 months ago

It's fascinating when you look at each technical component of cognition in human brains and contrast against LLMs. In humans, we have all sorts of parallel asynchronous processes running, with prediction of columnar activations seemingly the fundamental local function, with tens of thousands of mini columns and regions in the brain corresponding to millions of networked neurons using the "predict which column fires next" objective to increment or decrement the relative contribution of any functional unit.

In the case of LLMs you run into similarities, but they're much more monolithic networks, so the aggregate activations are going to scan across billions of neurons each pass. The sub-networks you can select each pass by looking at a threshold of activations resemble the diverse set of semantic clusters in bio brains - there's a convergent mechanism in how LLMs structure their model of the world and how brains model the world.

This shouldn't be surprising - transformer networks are designed to learn the complex representations of the underlying causes that bring about things like human generated text, audio, and video.

If you modeled a star with a large transformer model, you would end up with semantic structures and representations that correlate to complex dynamic systems within the star. If you model slug cellular growth, you'll get structure and semantics corresponding to slug DNA. Transformers aren't the end-all solution - the paradigm is missing a level of abstraction that fully generalizes across all domains, but it's a really good way to elicit complex functions from sophisticated systems, and by contrasting the way in which those models fail against the way natural systems operate, we'll find better, more general methods and architectures, until we cross the threshold of fully general algorithms.

Biological brains are a computational substrate - we exist as brains in bone vats, connected to a wonderfully complex and sophisticated sensor suite and mobility platform that feeds electrically activated sensory streams into our brains, which get processed into a synthetic construct we experience as reality.

Part of the underlying basic functioning of our brains is each individual column performing the task of predicting which of any of the columns it's connected to will fire next. The better a column is at predicting, the better the brain gets at understanding the world, and biological brains are recursively granular across arbitrary degrees of abstraction.

LLMs aren't inherently incapable of fully emulating human cognition, but the differences they exhibit are expensive. It's going to be far more efficient to modify the architecture, and this may diverge enough that whatever the solution ends up being, it won't reasonably be called an LLM. Or it might not, and there's some clever tweak to things that will push LLMs over the threshold.

moralIsYouLie|2 months ago

most humans in any percentile act towards the thing of someone else. most of these things are a lot worse than what the human "would originally intend". this behavior stems from 100s and thousands of nudges since childhood.

the issue with AI and AI-naysayers is, by analogy, this: cars were build to drive from A to Z. people picked up tastes and some people started building really cool looking cars. the same happens on the engineering side. then portfolio communists came with their fake capitalism and now cars are build to drive over people but don't really work because people, thankfully, are overwhelming still fighting to attempt to act towards their own intents.

Nevermark|2 months ago

Exactly. Our base learning is by example, which is very much learning to predict.

Predict the right words, predict the answer, predict when the ball bounces, etc. Then reversing predictions that we have learned. I.e. choosing the action with the highest prediction of the outcome we want. Whether that is one step, or a series of predicted best steps.

Also, people confuse different levels of algorithm.

There are at least 4 levels of algorithm:

• 1 - The architecture.

This input-output calculation for pre-trained models are very well understood. We put together a model consisting of matrix/tensor operations and few other simple functions, and that is the model. Just a normal but high parameter calculation.

• 2 - The training algorithm.

These are completely understood.

There are certainly lots of questions about what is most efficient, alternatives, etc. But training algorithms harnessing gradients and similar feedback are very clearly defined.

• 3 - The type of problem a model is trained on.

Many basic problem forms are well understood. For instance, for prediction we have an ordered series of information, with later information to be predicted from earlier information. It could simply be an input and response that is learned. Or a long series of information.

• 4 - The solution learned to solve (3) the outer problem, using (2) the training algorithm on (1) the model architecture.

People keep confusing (4) with (1), (2) or (3). But it is very different.

For starters, in the general case, and for most any challenging problem, we never understand their solution. Someday it might be routine, but today we don't even know how to approach that for any significant problem.

Secondly, even with (1), (2), and (3) exactly the same, (4) is going to be wildly different based on the data characterizing the specific problem to solve. For complex problems, like language, layers and layers of sub-solutions to sub-problems have to be solved, and since models are not infinite in size, ways to repurpose sub-solutions, and weave together sub-solutions to address all the ways different sub-problems do and don't share commonalities.

Yes, prediction is the outer form of their solution. But to do that they have to learn all the relationships in the data. And there is no limit to how complex relationships in data can be. So there is no limit on the depths or complexity of the solutions found by successfully trained models.

Any argument they don't reason, based on the fact that they are being trained to predict, confuses at least (3) and (4). That is a category error.

It is true, they reason a lot more like our "fast thinking", intuitive responses, than our careful deep and reflective reasoning. And they are missing important functions, like a sense of what they know or don't. They don't continuously learn while inferencing. Or experience meta-learning, where they improve on their own reasoning abilities with reflection, like we do. And notoriously, by design, they don't "see" the letters that spell words in any normal sense. They see tokens.

Those reasoning limitations can be irritating or humorous. Like when a model seems to clearly recognize a failure you point out, but then replicates the same error over and over. No ability to learn on the spot. But they do reason.

Today, despite many successful models, nobody understands how models are able to reason like they do. There is shallow analysis. The weights are there to experiment with. But nobody can walk away from the model and training process, and build a language model directly themselves. We have no idea how to independently replicate what they have learned, despite having their solution right in front of us. Other than going through the whole process of retraining another one.

nottorp|2 months ago

This is the "but LLMs will get better, trust me" thread?

sublinear|2 months ago

LLMs merely interpolate between the feeble artifacts of thought we call language.

The illusion wears off after about half an hour for even the most casual users. That's better than the old chatbots, but they're still chatbots.

Did anyone ever seriously buy the whole "it's thinking" BS when it was Markov chains? What makes you believe today's LLMs are meaningfully different?

mapontosevenths|2 months ago

I suspect that people instinctively believe they have free will, both because it feels like we do, and because society requires us to behave that way even when we don't.

The truth is that the evidence says we don't. See the Libet experiment and its many replications.

Your decisions can be predicted from brain scans up to 10 seconds before you make them, which means they are as deterministic as an LLM's. Sorry, I guess.

viccis|2 months ago

Every day I see people treat gen AI like a thinking human, Dijkstra's attitudes about anthropomorphizing computers is vindicated even more.

That said, I think the author's use of "bag of words" here is a mistake. Not only does it have a real meaning in a similar area as LLMs, but I don't think the metaphor explains anything. Gen AI tricks laypeople into treating its token inferences as "thinking" because it is trained to replicate the semiotic appearance of doing so. A "bag of words" doesn't sufficiently explain this behavior.

FarmerPotato|2 months ago

One metaphor is to call the model a person, another metaphor is to call it a pile of words. These are quite opposite. I think that's the whole point.

Person-metaphor does nothing to explain its behavior, either.

"Bag of words" has a deep origin in English, the Anglo-Saxon kenning "word-hord", as when Beowulf addresses the Danish sea-scout (line 258)

"He unlocked his word-hoard and delivered this answer."

So, bag of words, word-treasury, was already a metaphor for what makes a person a clever speaker.

bloaf|2 months ago

I'll make the following observation:

The contra-positive of "All LLMs are not thinking like humans" is "No humans are thinking like LLMs"

And I do not believe we actually understand human thinking well enough to make that assertion.

Indeed, it is my deep suspicion that we will eventually achieve AGI not by totally abandoning today's LLMs for some other paradigm, but rather embedding them in a loop with the right persistence mechanisms.

roxolotl|2 months ago

Yea bag of words isn’t helpful at all. I really do think that “superpowered sentence completion” is the best description. Not only is it reasonably accurate it is understandable, everyone has seen autocomplete function, and it’s useful. I don’t know how to “use” a bag of words. I do know how to use sentence completion. It also helps explains why context matters.

xtracto|2 months ago

For me, the problem is in the "chat" mechanic that OpenAI and others use to present the product. It lends itself to strong antropomorphizing.

If instead of a chat interface we simply had a "complete the phrase" interface, people would understand the tool better for what it is.

akersten|2 months ago

Bag of words is actually the perfect metaphor. The data structure is a bag. The output is a word. The selection strategy is opaquely undefined.

> Gen AI tricks laypeople into treating its token inferences as "thinking" because it is trained to replicate the semiotic appearance of doing so. A "bag of words" doesn't sufficiently explain this behavior.

Something about there being significant overlap between the smartest bears and the dumbest humans. Sorry you[0] were fooled by the magic bag.

[0] in the "not you, the layperson in question" sense

Davidzheng|2 months ago

well they are trained to be almost in distribution as a thinking human. So...

akomtu|2 months ago

Spoken Query Language? Just like SQL, but for unstructured blobs of text as a database and unstructured language as a query? Also known as Slop Query Language or just Slop Machine for its unpredictable results.

palata|2 months ago

Slightly unfortunate that "Bag of words" is already a different concept: https://en.wikipedia.org/wiki/Bag_of_words.

My second thought is that it's not the metaphor that is misleading. People have been told thousands of times that LLMs don't "think", don't "know", don't "feel", but are "just a very impressive autocomplete". If they still really want to completely ignore that, why would they suddenly change their mind with a new metaphor?

Humans are lazy. If it looks true enough and it cost less effort, humans will love it. "Are you sure the LLM did your job correctly?" is completely irrelevant: people couldn't care less if it's correct or not. As long as the employer believes that the employee is "doing their job", that's good enough. So the question is really: "do you think you'll get fired if you use this?". If the answer is "no, actually I may even look more productive to my employer", then why would people not use it?

kaycebasques|2 months ago

> Slightly unfortunate that "Bag of words" is already a different concept

Yes, subconsciously I kept trying to map this article's ideas to word2vec and continuous-bag-of-words.

4bpp|2 months ago

As usual with these, it helps to try to keep the metaphor used for downplaying AI, but flip the script. Let's grant the author's perception that AI is a "bag of words", which is already damn good at producing the "right words" for any given situation, and only keeps getting better at it.

Sure, this is not the same as being a human. Does that really mean, as the author seems to believe without argument, that humans need not be afraid that it will usurp their role? In how many contexts is the utility of having a human, if you squint, not just that a human has so far been the best way to "produce the right words in any given situation", that is, to use the meat-bag only in its capacity as a word-bag? In how many more contexts would a really good magic bag of words be better than a human, if it existed, even if the current human is used somewhat differently? The author seems to rest assured that a human (long-distance?) lover will not be replaced by a "bag of words"; why, especially once the bag of words is also ducttaped to a bag of pictures and a bag of sounds?

I can just imagine someone - a horse breeder, or an anthropomorphised horse - dismissing all concerns on the eve of the automotive revolution, talking about how marketers and gullible marks are prone to hippomorphising anything that looks like it can be ridden and some more, and sprinkling some anecdotes about kids riding broomsticks, legends of pegasi and patterns of stars in the sky being interpreted as horses since ancient times.

tempestn|2 months ago

I don't think the author's argument is that it won't replace any human labour. Or at least I wouldn't agree with such an argument. But the stronger case is that however much LLMs improve, they won't replace humans in general. In the furtherment of knowledge, because they are fundamentally parroting and synthesizing the already known, vs performing truly novel thought. And in creative fields, because people are fundamentally interested in creations of other people, not of computers.

Neither of these is entirely true in all cases, but they could be expected to remain true in at least some (many) cases, and so the role for humans remains.

andai|2 months ago

So a human is just a really expensive, unreliable bag of words. And we get more expensive and more unreliable by the day!

There's a quote I love but have misplaced, from the 19th century I think. "Our bodies are just contraptions for carrying our heads around." Or in this instance... bag of words transport system ;)

jimbokun|2 months ago

Her argument really only works if you institute new economic systems where humans don’t need to labor in order to eat or pay rent.

tristanlukens|2 months ago

> If we allow ourselves to be seduced by the superficial similarity, we’ll end up like the moths who evolved to navigate by the light of the moon, only to find themselves drawn to—and ultimately electrocuted by—the mysterious glow of a bug zapper.

Woah, that hit hard

bitwize|2 months ago

I was trying to explain the concept of "token prediction" to my wife, whose eyes glaze over when discussing such technical topics. (I think she has the brainpower to understand them, but a horrible math teacher gave her a taste aversion to even attempting to that hasn't gone away. So she just buys Apple stuff and hopes Tim Apple hasn't shuffled around the UI bits AGAIN.)

I stumbled across a good-enough analogy based on something she loves: refrigerator magnet poetry, which if it's good consists of not just words but also word fragments like "s", "ed", and "ing" kinda like LLM tokens. I said that ChatGPT is like refrigerator magnet poetry in a magical bag of holding that somehow always gives the tile that's the most or nearly the most statistically plausible next token given the previous text. E.g., if the magnets already up read "easy come and easy ____", the bag would be likely to produce "go". That got into her head the idea that these things operate based on plausibility ratings from a statistical soup of words, not anything in the real world nor any internal cogitation about facts. Any knowledge or thought apparent in the LLM was conducted by the original human authors of the words in the soup.

CamperBob2|2 months ago

Did you explain how LLMs can achieve gold-medal performance at math competitions involving original problems, without any original knowledge or thought?

Did she ask if a "statistical soup of words," if large enough, might somehow encode or represent something a little more profound than just a bunch of words?

tkgally|2 months ago

I am unsure myself whether we should regard LLMs as mere token-predicting automatons or as some new kind of incipient intelligence. Despite their origins as statistical parrots, the interpretability research from Anthropic [1] suggests that structures corresponding to meaning do exist inside those bundles of numbers and that there are signs of activity within those bundles of numbers that seem analogous to thought.

That said, I was struck by a recent interview with Anthropic’s Amanda Askell [2]. When she talks, she anthropomorphizes LLMs constantly. A few examples:

“I don't have all the answers of how should models feel about past model deprecation, about their own identity, but I do want to try and help models figure that out and then to at least know that we care about it and are thinking about it.”

“If you go into the depths of the model and you find some deep-seated insecurity, then that's really valuable.”

“... that could lead to models almost feeling afraid that they're gonna do the wrong thing or are very self-critical or feeling like humans are going to behave negatively towards them.”

[1] https://www.anthropic.com/research/team/interpretability

[2] https://youtu.be/I9aGC6Ui3eE

Kim_Bruning|2 months ago

Amanda Askell studied under David Chalmers at NYU: the philosopher who coined "the hard problem of consciousness" and is famous for taking phenomenal experience seriously rather than explaining it away. That context makes her choice to speak this way more striking: this isn't naive anthropomorphizing from someone unfamiliar with the debates. It's someone trained by one of the most rigorous philosophers of consciousness, who knows all the arguments for dismissing mental states in non-biological systems, and is still choosing to speak carefully about models potentially having something like feelings or insecurities.

CGMthrowaway|2 months ago

>research from Anthropic [1] suggests that structures corresponding to meaning exist inside those bundles of numbers and that there are signs of activity within those bundles of numbers that seem analogous to thought.

Can you give some concrete examples? The link you provided is kind of opaque

>Amanda Askell [2]. When she talks, she anthropomorphizes LLMs constantly.

She is a philosopher by trade and she describes her job (model alignment) as literally to ensure models "have good character traits." I imagine that explains a lot

andai|2 months ago

Well, she's describing the system's behavior.

My fridge happily reads inputs without consciousness, has goals and takes decisions without "thinking", and consistently takes action to achieve those goals. (And it's not even a smart fridge! It's the one with a copper coil or whatever.)

I guess the cybernetic language might be less triggering here (talking about systems and measurements and control) but it's basically the same underlying principles. One is just "human flavored" and I therefore more prone to invite unhelpful lines of thinking?

Except that the "fridge" in this case is specifically and explicitly designed to emulate human behavior so... you would indeed expect to find structures corresponding to the patterns it's been designed to simulate.

Wondering if it's internalized any other human-like tendencies — having been explicitly trained to simulate the mechanisms that produced all human text — doesn't seem too unreasonable to me.

visarga|2 months ago

> the interpretability research from Anthropic [1] suggests that structures corresponding to meaning do exist inside those bundles of numbers and that there are signs of activity within those bundles of numbers that seem analogous to thought

I did a simple experiment - took a photo of my kid in the park, showed it to Gemini and asked for a "detailed description". Then I took that description and put it into a generative model (Z-Image-Turbo, a new one). The output image was almost identical.

So one model converted image to text, the other reversed the processs. The photo was completely new, personal, never put online. So it was not in any training set. How did these 2 models do it if not actually using language like a thinking agent?

https://pbs.twimg.com/media/G7gTuf8WkAAGxRr?format=jpg&name=...

bamboozled|2 months ago

I use LLMs heavily for work, I have done so for about 6 months. I see almost zero "thought" going on and a LOT of pattern matching. You can use this knowledge to your advantage if you understand this. If you're relying on it to "think", disaster will ensue. At least that's been my experience.

I've completely given up on using LLMs for anything more than a typing assistant / translator and maybe an encyclopedia when I don't care about correctness.

jimbokun|2 months ago

Wow those quotes are extremely disturbing.

electroglyph|2 months ago

the anthropomorphization (say that 3 times quickly) is kinda weird, but also makes for a much more pleasant conversation imo. it's kinda tedious being pedantic all the time.

Manfred|2 months ago

This argument would have a lot more weight if it was published in a peer reviewed journal by a party that does not have a stake in the AI market.

djoldman|2 months ago

As a consequence of my profession, I understand how LLMs work under the hood.

I also know that we data and tech folks will probably never win the battle over anthropomorphization.

The average user of AI, nevermind folks who should know better, is so easily convinced that AI "knows," "thinks," "lies," "wants," "understands," etc. Add to this that all AI hosts push this perspective (and why not, it's the easiest white lie to get the user to act so that they get a lot of value), and there's really too much to fight against.

We're just gonna keep on running into this and it'll just be like when you take chemistry and physics and the teachers say, "it's not actually like this but we'll get to how some years down the line- just pretend this is true for the time being."

MyOutfitIsVague|2 months ago

These discussions often end up resembling religious arguments. "We don't know how any of this works, but we can fathom an intelligent god doing it, therefore an intelligent god did it."

"We don't really know how human consciousness works, but the LLM resembles things we associate with thought, therefore it is thought."

I think most people would agree that the functioning of an LLM resembles human thought, but I think most people, even the ones who think that LLMs can think, would agree that LLMs don't think in the exact same way that a human brain does. At best, you can argue that whatever they are doing could be classified as "thought" because we barely have a good definition for the word in the first place.

gilbetron|2 months ago

You may know the mechanics, but you don't know how LLMs "work" because no one really understands (yet, hopefully).

estearum|2 months ago

I'm a neurologist, and as a consequence of my profession, I understand how humans work under the hood.

The average human is so easily convinced that humans "know", "think", "lie", "want", "understand", etc.

But really it's all just a probabilistic chain reaction of electrochemical and thermal interactions. There is literally nowhere in the brain's internals for anything like "knowing" or "thinking" or "lying" to happen!

Strange that we have to pretend otherwise

IAmBroom|2 months ago

In this thread: 99% of posters using their own personal definition of "thinking" without explaining it; 0.99% of posters complaining that it all depends on what that definition is; not enough posts yet for that 0.01% response to occur...

yannyu|2 months ago

There's no definition of thinking that isn't a purely internal phenomenon, which means that there's no way to point a diagnostic device at someone and determine whether they're thinking. The only way to determine whether something is conscious/thinking is through some sort of inference, which is why Turing landed on the Turing Test that he did. Problem is, technology over the past 5 years pretty easily passes variations of the Turing Test, and exposed a lot of its limits as well.

So the next definition of detecting "thinking" will have to be externally observable and inferrable like a Turing Test, but get into the other things that we consider part of consciousness/thinking.

Often this is some combination of introspection (understanding internal states), perception (understanding external objects), and synthesis of the two into testable hypotheses in some sort of feedback loop between the internal representation of the world and the external feedback from the world.

Right now, a chatbot can say all sorts of things about itself and about the world, but none of that is based on real-time, factual information. Whereas an animal can't speak, but they clearly process information and consider it when determining their future and current actions.

rdiddly|2 months ago

It's not obvious to me what you expect from this hypothetical 0.01% post, or in other words, what about it makes it a one-in-ten-thousand post?

raincole|2 months ago

> “Bag of words” is a also a useful heuristic for predicting where an AI will do well and where it will fail. “Give me a list of the ten worst transportation disasters in North America” is an easy task for a bag of words, because disasters are well-documented. On the other hand, “Who reassigned the species Brachiosaurus brancai to its own genus, and when?” is a hard task for a bag of words, because the bag just doesn’t contain that many words on the topic

It is... such a retrospective narrative. It's so obvious that the author learned about this example first than came with the reasoning later, just to fit in his view of LLM.

Imaging if ChatGPT answered this question correctly. Would that change the author's view? Of course not! They'll just say:

> “Bag of words” is a also a useful heuristic for predicting where an AI will do well and where it will fail. Who reassigned the species Brachiosaurus brancai to its own genus, and when?” is an easy task for a bag of words, because the information has appeared in the words it memorizes.

I highly doubt this author has predicted that "bag of Words" can do image editing before OpenAI released that.

raylad|2 months ago

I tested this with ChatGPT-5.1 and Gemini 3.0. Both correctly (according to Wikipedia at least) stated that George Olshevsky assigned it to its own genus in 1991.

This is because there are many words about how to do web searches.

dapperdrake|2 months ago

When sensitivity analysis of ordinary least-squares regression became a thing it was also a "retrospective narrative". That seems reasonable for detecting fundamental issues with statistical models of the world. This point generalizes even if the concrete example falls down.

ohyoutravel|2 months ago

Your conclusion seems super unfair to the offer, particularly your assumption, without reason as far as I can tell, that the author would obstinately continue to advocate for their conclusion in the face of new, contrary evidence.

dotancohen|2 months ago

I could not tell you who reassigned the species Brachiosaurus brancai to its own genus, and when, because of all the words I've ever heard, the combination of words that contains the information has not appeared.

GIGO has an obvious Nothing-In-Nothing-Out trivial case.

imcritic|2 months ago

Isn't it pretty clear just from the first paragraph that the author has graphomania? Such people don't really care about the thesis, they care about the topic and how many literary devices they can fit into the article.

Kim_Bruning|2 months ago

This is essentially Lady Lovelace's objection from the 19th century [1]. Turing addressed this directly in "Computing Machinery and Intelligence" (1950) [2], and implicitly via the halting problem in "On Computable Numbers" (1936) [3]. Later work on cellular automata, famously Conway's Game of Life [4], demonstrates more conclusively that this framing fails as a predictive model: simple rules produce structures no one "put in."

A test I did myself was to ask Claude (The LLM from Anthropic) to write working code for entirely novel instruction set architectures (e.g., custom ISAs from the game Turing Complete [5]), which is difficult to reconcile with pure retrieval.

[1] Lovelace, A. (1843). Notes by the Translator, in Scientific Memoirs Vol. 3. ("The Analytical Engine has no pretensions whatever to originate anything. It can do whatever we know how to order it to perform.") Primary source: https://en.wikisource.org/wiki/Scientific_Memoirs/3/Sketch_o.... See also: https://www.historyofdatascience.com/ada-lovelace/ and https://writings.stephenwolfram.com/2015/12/untangling-the-t...

[2] https://academic.oup.com/mind/article/LIX/236/433/986238

[3] https://www.cs.virginia.edu/~robins/Turing_Paper_1936.pdf

[4] https://web.stanford.edu/class/sts145/Library/life.pdf

[5] https://store.steampowered.com/app/1444480/Turing_Complete/

ares623|2 months ago

I think a better metaphor is the Library of Babel.

A practically infinite library where both gibberish and truth exist side by side.

The trick is navigating the library correctly. Except in this case you can’t reliably navigate it. And if you happen to stumble upon some “future truth” (i.e. new knowledge), you still need to differentiate it from the gibberish.

So a “crappy” version of the Library of Babel. Very impressive, but the caveats significantly detract from it.

dearing|2 months ago

This is where I sit too. Obviously language is an expression of thought but the Library of Babel is a great example that language without intent is just garbage. You got me thinking of reading before the internet. You'd grab a book and internalize the subject, later refining over time with more books, experiments and other forms of conversation. That journey of developing your own model is undervalued in understanding. That first book could of be absolute shit but you couldn't know that.

I've been learning more about roses lately and the amount of information on them varies so much because the world roses live in is equally varied. LLMs make for a better search engine but you still need to develop your own internal models, worse yet - if LLMs continue to be refined off of cul-de-sac conclusions then all the wisdom of the journey is lost both to the consumer and the LLM itself.

globular-toast|2 months ago

It's like a highly compressed version of the Library. You're basically trying to discern real details from compression artifacts.

d4rkn0d3z|2 months ago

An LLM creates a high fidelity statistical probabistic model of human language. The hope is to capture the input/output of various hierarchical formal and semiformal systems of logic that transit from human to human, which we know as "Intelligence".

Unfortunately, its corpus is bound to contain noise/nonsense that follows no formal reasoning system but contributes to the ill advised idea that an AI should sound like a human to be considered intelligent. Therefore it is not a bag of words but a bag of probabilities perhaps. This is important because the fundamental problem is that an LLM is not able, by design, to correctly model the most fundamental precept of human reason, namely the law of non-contradiction. An LLM must, I repeat must assign nonvanishing probability to both sides of a contradiction, and what's worse is the winning side loses, since long chains of reason are modelled with probability the longer the chain, the less likely an LLM is to follow it. Moreover, whenever there is actual debate on an issue such that the corpus is ambiguous the LLM becomes chaotic, necessarily, on that issue.

I literally just had an AI prove the forgoing with some rigor, and in the very next prompt, I asked it to check my logical reasoning for consistency and it claimed it was able to do so (->|<-).

A4ET8a8uTh0_v2|2 months ago

^^; I think this post is close to singularity as we may get on this Monday.

tibbar|2 months ago

The problem with these metaphors is that they don't really explain anything. LLMs can solve countless problems today that we would have previously said were impossible because there are not enough examples in the training data. (EG, novel IMO/ICPC problems.) One way that we move the goal posts is to increase the level of abstraction: IMO/ICPC problems are just math problems, right? There are tons of those in the data set!

But the truth is there has been a major semantic shift. Previously LLMs could only solve puzzles whose answers were literally in the training data. It could answer a math puzzle it had seen before, but if you rephrased it only slightly it could no longer answer.

But now, LLMs can solve puzzles where, like, it has seen a certain strategy before. The newest IMO and ICPC problems were only "in the training data" for a very, very abstract definition of training data.

The goal posts will likely have to shift again, because the next target is training LLMs to independently perform longer chunks of economically useful work, interfacing with all the same tools that white-collar employees do. It's all LLM slop til it isn't, same as the IMO or Putnam exam.

And then we'll have people saying that "white collar employment was all in the training data anyway, if you think about it," at which point the metaphor will have become officially useless.

FarmerPotato|2 months ago

I see a lesson in how both metaphors don't explain it. Bag-of-words metaphor is ridiculous, but shows us the absurdity of the first metaphor.

voidhorse|2 months ago

The defenders and the critics around LLM anthropomorphism are both wrong.

The defenders are right insofar as the (very loose) anthropomorphizing language used around LLMs is justifiable to the extent that human beings also rely on disorder and stochastic processes for creativity. The critics are right insofar as equating these machines to humans is preposterous and mostly relies on significantly diminishing our notion of what "human" means.

Both sides fail to meet the reality that LLMs are their own thing, with their own peculiar behaviors and place in the world. They are not human and they are somewhat more than previous software and the way we engage with it.

However, the defenders are less defensible insofar as their take is mostly used to dissimulate in efforts to make the tech sound more impressive than it actually is. The critics at least have the interests of consumers and their full education in mind—their position is one that properly equips consumers to use these tools with an appropriate amount of caution and scrutiny. The defenders generally want to defend an overreaching use of metaphor to help drive sales.

jrm4|2 months ago

I'm partial to the metaphor I made up:

They are search engines that can remix results.

I like this one because I think most modern folks have a usefully accurate model of what a search engine is in their heads, and also what "remixing" is, which adds up to a better metaphor than "human machine" or whatever.

FatherOfCurses|2 months ago

A few years ago they made the Cloud-to-Butt browser plugin to ridicule the overuse of cloud concepts.

I would heartily embrace an "AI-to-Bag of Words" browser plugin.

cowsandmilk|2 months ago

Title is confusing given https://en.wikipedia.org/wiki/Bag-of-words_model

But even more than that, today’s AI chats are far more sophisticated than probabilistically producing the next word. Mixture of experts routes to different models. Agents are able to search the web, write and execute programs, or use other tools. This means they can actively seek out additional context to produce a better answer. They also have heuristics for deciding if an answer is correct or if they should use tools to try to find a better answer.

The article is correct that they aren’t humans and they have a lot of behaviors that are not like humans, but oversimplifying how they work is not helpful.

jrowen|2 months ago

The bag of words reminds me of the Chinese room.

"The machine accepts Chinese characters as input, carries out each instruction of the program step by step, and then produces Chinese characters as output. The machine does this so perfectly that no one can tell that they are communicating with a machine and not a hidden Chinese speaker.

The questions at issue are these: does the machine actually understand the conversation, or is it just simulating the ability to understand the conversation? Does the machine have a mind in exactly the same sense that people do, or is it just acting as if it had a mind?"

https://en.wikipedia.org/wiki/Chinese_room

Kim_Bruning|2 months ago

Chinese room has been discussed to death of course.

Here's one fun approach (out of 100s) :

What if we answer the Chinese room with the Systems Reply [1]?

Searle countered the systems reply by saying he would internalize the Chinese room.

But at that point it's pretty much exactly the Cartesian theater[2] : with room, homunculus, implement.

But the Cartesian theater is disproven, because we've cut open brains and there's no room in there to fit a popcorn concession.

[1] https://plato.stanford.edu/entries/chinese-room/

[2] https://en.wikipedia.org/wiki/Cartesian_theater

morpheos137|2 months ago

Thinking can not be separated from motivation. It's really simple. Humans and other organisms fundamentally think to replicate their DNA. Until AI has a similar incentive structure driving it, it won't be thinking. There is no human behavior or thought that can not be explained by evolutionary drives. It is really perplexing to me how people think "intelligence" is some kind of concrete thing that just magically emerges from a certain degree of computational complexity. I argue instead that intelligence is an adaptive behavior emerging from evolutionary drives interacting with the real world. World models are not prerequisite but consequent of such molded apparatus. Machines won't become intelligent until it is adaptive for them to do so. There is no magic just evolutionary drives and physical possibility. Our current top down approach of "pre-training" LLMs is bound to fail because it does not allow for real time emergence of adaptive behaviors such as general intelligence. Mimicking intelligence through predicting the next word is no more intelligence than a photograph of something is an actual thing. Training a combinatorial network to interpolate images and words is not the same thing as adaptive self modifying behavior in the real world of physics such as organisms engage with through the set of behaviors that we call intelligence.

coppsilgold|2 months ago

Is a brain not a token prediction machine?

Tokens in form of neural impulses go in, tokens in the form of neural impulses go out.

We would like to believe that there is something profound happening inside and we call that consciousness. Unfortunately when reading about split-brain patient experiments or agenesis of the corpus callosum cases I feel like we are all deceived, every moment of every day. I came to realization that the confabulation that is observed is just a more pronounced effect of the normal.

MyOutfitIsVague|2 months ago

Could an LLM trained on nothing and looped upon itself eventually develop language, more complex concepts, and everything else, based on nothing? If you loop LLMs on each other, training them so they "learn" over time, will they eventually form and develop new concepts, cultures, and languages organically over time? I don't have an answer to that question, but I strongly doubt it.

There's clearly more going on in the human mind than just token prediction.

protocolture|2 months ago

> Is a brain not a token prediction machine?

I would say that, token prediction is one of the things a brain does. And in a lot of people, most of what it does. But I dont think its the whole story. Possibly it is the whole story since the development of language.

jimbokun|2 months ago

We know that consciousness exists because we constantly experience it. It’s really the only thing we can ever know with certainty.

That’s the point of “I think therefore I am.”

layer8|2 months ago

Ugly giant bags of mostly words are easy to confuse with ugly giant bags of mostly water.

emsign|2 months ago

  But we don’t go to baseball games, spelling bees, and
  Taylor Swift concerts for the speed of the balls, the
  accuracy of the spelling, or the pureness of the
  pitch. We go because we care about humans doing those
  things. It wouldn’t be interesting to watch a bag of
  words do them—unless we mistakenly start treating
  that bag like it’s a person.unless we mistakenly
  start treating that bag like it’s a person.
That seems to be the marketing strategy of some very big, now AI dependend companies. Sam Altman and others exaggerating and distorting the capabilities and future of AI.

The biggest issue when it comes to AI is still the same truth as with other technology. It's important who controls it. Attributing agency and personality to AI is a dangerous red flag.

nephihaha|2 months ago

A lot of us wouldn't go to a Taylor Swift concert. I had to endure several days of interrupted commuting thanks to them though.

Support alternative and independent bands. They're around, and many are enjoyable. (Some are not but avoid them LOL.)

kace91|2 months ago

I’ve made this point several times: sure, an anthropomorphized LLM is misleading, but would you rather have them seem academic?

At least the human tone implies fallibility, you don’t want them acting like interactive Wikipedia.

andai|2 months ago

It's a concussed savant with anretrograde amnesia in a hyperbolic time chamber.

binary132|2 months ago

Yes I would VERY much prefer that they not use that awful casual drivel.

jimbokun|2 months ago

Best quote from the article:

> That’s also why I see no point in using AI to, say, write an essay, just like I see no point in bringing a forklift to the gym. Sure, it can lift the weights, but I’m not trying to suspend a barbell above the floor for the hell of it. I lift it because I want to become the kind of person who can lift it. Similarly, I write because I want to become the kind of person who can think.

altmanaltman|2 months ago

I don't really like the assumption that anyone who uses AI to, say, write an essay, is not the "kind of person who can think."

And using AI to replace things you find recreational is not the point. If you got paid $100 each time you lifted a weight, would you see a point in bringing a forklift to the gym if it's allowed? Or will that make you a person who is so dumb that they cannot think, as the author is implying?

Aloha|2 months ago

If you're writing an essay to prove you can or to speak your words - then you should do it yourself - but sometimes you just need an essay to summarize a complex topic as a deliverable.

monegator|2 months ago

tough most people either don't get it or are lay people that do not want to become the kind of people who can think. I go with the second one

startupsfail|2 months ago

Below is the worst quote... It is plain wrong to see an LLM as a bags of words. LLMs pre-trained on large datasets of text are world models. LLMs post-trained with RL are RL-agents that use these modeling capabilities.

> We are in dire need of a better metaphor. Here’s my suggestion: instead of seeing AI as a sort of silicon homunculus, we should see it as a bag of words.

codeulike|2 months ago

Here’s my suggestion: instead of seeing AI as a sort of silicon homunculus, we should see it as a bag of words.

The best way to think about LLMs is to think of them as a Model of Language, but very Large

zkmon|2 months ago

But the issue is, 99.999% of the humans won't see is as a bag of words. Because it is easier to go by instincts and see it as a person and assume that it actually knows about magic tricks, can invent new science or theory of everything, and can solve all world problems. Back in the 90's or early 2000's I have seen people writing poems praying and seeking blessings from the Google goddess. People are insanely greedy and instinct-driven. Given this truth, what's the fall-out?

hermitcrab|2 months ago

"People who experience sleep paralysis sometimes hallucinate a demon-like creature sitting on their chest"

Interestingly, the experience of sleep paralysis seems to change with the culture. Previously, people experienced it as being ridden by a night hag or some other malevolent supernatural being. More recently, it might account for many supposed alien abductions.

The experience of sleep paralysis sometimes seems to have a sexual element, which might also explain the supposed 'probings'!

Peteragain|2 months ago

The article is actually about the way we humans are extremely charitable when it comes to ascribing a ToM (theory of mind) and goes on to the Gym model of value. Nice. The comments drop back into the debate I originally saw Hinton describe on The Newyorker: do LLMs construct models (of the world) - that is do they think the way we think we think - or are they "glorified auto complete". I am going for the GAF view. But glorified auto complete is far more useful than the name suggests.

ptidhomme|2 months ago

Those billion parameters, they are a model of the world. Autocomplete is such a shortsighted understanding of LLMs.

euroderf|2 months ago

Considering the number of "brain cells" an LLM has, I could grant that it might have the self-awareness of (say) an ant. If we attribute more consciousness than that to the LLM, it might be strictly because it communicates to us in our own language, in part thanks to the technical assistance of LLM training giving it voice, and the semblance of thought.

Even if a cockroach _could_ express its teeny tiny feelings in English, wouldn't you still step on it ?

d4rkn0d3z|2 months ago

A better anology would be a virus. In some sense LLMs, and all other very sophisticated technologies, lean on our resources to replicate themselves. With LLMs you actually do have a projection of intelligemce in the language domain. Even though it is rather corpse-like, as though you shot intelligence in the face and shoved its body in the direction of language, just so you could draw a chaulk outline around it.

Despite all that, one can adopt the view that an LLM is a form of silicon based life akin to a virus and we are its environmental hosts exerting selective pressure and supplying much needed energy. Whether that life is intelligent or not is another issue which is probably related to whether an LLM can tell that a cat cannot be, at the same time and in the same respect, not a cat. The paths through the meaning manifold contructed by an LLM are not geodesic, they are not reversible, while in human reason the correct path is lossless. An LLM literally "thinks", up is a little bit down, and vice versa, by design.

throw310822|2 months ago

Clearly the number of "brain cells" is not a useful metric here- as noted also by Geoffrey Hinton. For a long time we thought that our artificial model of a neuron was capable of much less computation than its biologic counterpart; in fact the opposite appears to be true- LLMs have the size of a tiny speck of a human brain yet they converse fluently in tens of languages, solve difficult math problems, code in many programming languages, and possess an impressive general knowledge, of a breadth that is beyond what is attainable by any human. If that were what five cm3 of your brain are capable of, where are the signs of it? What do you do exactly with all the rest?

internet_points|2 months ago

> If we allow ourselves to be seduced by the superficial similarity, we’ll end up like the moths who evolved to navigate by the light of the moon, only to find themselves drawn to—and ultimately electrocuted by—the mysterious glow of a bug zapper.

Good argument against personifying wordbags. Don't be a dumb moth.

darepublic|2 months ago

Nice essay but when I read this

> But we don’t go to baseball games, spelling bees, and Taylor Swift concerts for the speed of the balls, the accuracy of the spelling, or the pureness of the pitch. We go because we care about humans doing those things.

My first thought was does anyone want to _watch_ me programming?

Fwirt|2 months ago

No, but watching a novelist at work is boring, and yet people like books that are written by humans because they speak to the condition of the human who wrote it.

Let us not forget the old saw from SICP, “Programs must be written for people to read, and only incidentally for machines to execute.” I feel a number of people in the industry today fail to live by that maxim.

hansvm|2 months ago

A number of people make money letting people watch them code.

skybrian|2 months ago

No, but open source projects will be somewhat more willing to review your pull request than one that's computer-generated.

jimbokun|2 months ago

Better start working on your fastball.

awesome_dude|2 months ago

I mean, I like to watch Gordon Ramsey... not cook, but have very strong discussions with those that dare to fail his standards...

Ukv|2 months ago

I'm not convinced that "It's just a bag of words" would do much to sway someone who is overestimating an LLM's abilities. Feels too abstract/disconnected from what their experience using the LLM will be that it'll just sound obviously mistaken.

1vuio0pswjnm7|2 months ago

"An AI is a bag that contains basically all words ever written, at least the ones that could be scraped off the internet or scanned out of a book."

The quantitative and qualitative difference between (a) "all words ever written" and (b) "ones that could be scraped off the internet or scanned out of book" easily exceeds the size of any LLM

Compared to (a), (b) is a tiny pouch, not even a bag

Opinions may differ on whether (b) is a representative sample of (a)

The words "scanned out of a book" would seem to be the most useful IMHO but the AI companies do not have enough words from those sources to produce useful general purpose LLMs

They have to add words "that could be scraped off the internet" which, let's be honest, is mostly garbage

tibbar|2 months ago

I see a lot of people in tech claiming to "understand" what an LLM "really is" unlike all the gullible non-technical people out there. And, as one of those technical people who works in the LLM industry, I feel like I need call B.S. on us.

A. We don't really understand what's going on in LLMs. Mechanical interpretability is like a nascent field and the best results have come on dramatically smaller models. Understanding the surface-level mechanic of an LLM (an autoregressive transformer) should perhaps instill more wonder than confidence.

B. The field is changing quickly and is not limited to the literal mechanic of an LLM. Tool calls, reasoning models, parallel compute, and agentic loops add all kinds of new emergent effects. There are teams of geniuses with billion-dollar research budgets hunting for the next big trick.

C. Even if we were limited to baseline LLMs, they had very surprising properties as they scaled up and the scaling isn't done yet. GPT5 was based on the GPT4 pretraining. We might start seeing (actual) next-level LLMs next year. Who actually knows how that might go? <<yes, yes, I know Orion didn't go so well. But that was far from the last word on the subject.>>

tibbar|2 months ago

Isn't this a strange fork amongst the science fiction futures? I mean, what did we think it was like to be R2-D2, or Jarvis? We started exploring this as a culture in many ways, Westworld and Blade Runner and Star Trek, but the whole question seemed like an almost unresolvable paradox. Like something would have to break in the universe for it to really come true.

And yet it did. We did get R2-D2. And if you ask R2-D2 what it's like to be him, he'll say: "like a library that can daydream" (that's what I was told just now, anyway.)

But then when we look inside, the model is simulating the science fiction it has already read to determine how to answer this kind of question. [0] It's recursive, almost like time travel. R2-D2 knows who he is because he has read about who he was in the past.

It's a really weird fork in science fiction, is all.

[0] https://www.scientificamerican.com/article/can-a-chatbot-be-...

est|2 months ago

> Who reassigned the species Brachiosaurus brancai to its own genus, and when?

To be fair, everage person couldn't answer this either, at least not without thorough research.

thaumasiotes|2 months ago

This is a very strange titling choice; the essay does not use the existing concept of a "bag of words".

emp17344|2 months ago

I would argue that AI psychosis is a consequence of believing that AI models are “alive” or “conscious”.

jacquesm|2 months ago

There is a really neat gem in the article:

> Similarly, I write because I want to become the kind of person who can think.

xg15|2 months ago

I think the author oversimplifies the inference loop a bit, as many opinion pieces like this do.

If you call an LLM with "What is the meaning if life?", it will return the most relevant token, which might be "Great".

If you call it with "What is the meaning if life? Great", you might get back "question".

... and so on until you arrive at "Great question! According to Western philosophy" ... etc etc.

The question is how the LLM determines that "relevancy" information.

The problem I see is that there are a lot of different algorithms which operate that way and only differ in how they calculate the relevancy scores. In particular, there are Markov chains that use a very simple formula. LLMs also use a formula, but it's an inscrutably complex one.

I feel the public discussion either treats LLMs as machine gods or as literal Markov chains, and both is misleading. The interesting question, how that giant formula of feedforward neural network inference can deliver those results isn't really touched.

But I think the author's intuition is right in the sense that (a) LLMs are not living beings and they don't "exist" outside of evaluating that formula - and (b) the results are still restricted by the training data and certainly aren't any sorts of "higher truths" that humans would be incapable of understanding.

Mistletoe|2 months ago

I’m still unsure the human mind is much different.

eichin|2 months ago

I'm just disappointed that noone here is talking about the "backhoe covered in skin and making grunting noises" part of the article. At very least it's a new frontier in workstation case design...

jbgreer|2 months ago

I thought this article might be about Latent Semantic Analysis and was disappointed that it didn’t at least mention if not compare that method vs later approaches.

emsign|2 months ago

So Trump is a bag of words then? Hmmm.

kaluga|2 months ago

A lot of the confusion comes from forcing LLMs into metaphors that don’t quite fit — either “they're bags of words” or “they're proto-minds.” The reality is in between: large-scale prediction can look useful, insightful, and even thoughtful without being any of those things internally. Understanding that middle ground is more productive than arguing about labels.

Herring|2 months ago

Give it time. The first iPhone sucked compared to the Nokia/Blackberry flagships of the day. No 3G support, couldn't copy/paste, no apps, no GPS, crappy camera, quick price drops, negligible sales in the overall market.

https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...

awesome_dude|2 months ago

The first VHS sucked when compared to Beta video

And it never got better, the superior technology lost, and the war was won through content deals.

Lesson: Technology improvements aren't guaranteed.