Evidence of a predictive coding hierarchy in the human brain listening to speech

[+] physicles|3 years ago|reply

Anecdotally, I realized I was doing something like this when I was having trouble understanding people speaking in a noisy room, in a language I’m not so proficient in.

As you listen to someone, your brain is constantly matching the sounds arriving at your ears with a prediction of what the next few words might be. Listening in a non-native language, my predictions about what comes next aren’t very well tuned at all, so if I can’t hear every word clearly then I can easily get lost.

Another signpost: sometimes you mishear someone — “oh, I thought you said xyz” — but the thing you thought you heard them say is never gibberish, it’s a grammatically and contextually valid way to complete the sentence.

[+] MagicMoonlight|3 years ago|reply

There’s definitely a similarity with us in that you need to have been trained on enough data to build up that prediction.

Language models are just missing some component that we have. The method for deciding what to output is wrong. People aren’t just guessing the next sound. It’s like they said, there’s multiple levels of thought and prediction going on.

It needs some sort of scratch pad where it keeps track of states/goals. “I’m writing a book” “I want to make this character scary”

Currently it only works on the next tokens and its context is the entire text so far, but that’s not accurate. I’m not deciding what to say based exactly on the entire text so far, I’m feature extracting and then using those features as context.

e.g She looks sad but she’s saying she is fine and it’s to do with death because my memory says her dad died recently so the key features to use for generation are: her being sad, her dad died, she may not want to talk about it

[+] kensai|3 years ago|reply

Very good point. This also applies for lip reading, especially important in languages one is not proficient in or has some hearing hindrance. It was especially hard in the COVID pandemic where people where wearing masks all the time.

[+] FrustratedMonky|3 years ago|reply

Where are the jokes that most people aren't much more than copy/paste, or LLM. In most daily lives, a huge amount of what we do is habit, and just plain following a pattern. When someone says "Good Morning", nobody is stopping and thinking "HMMM, let me think about what word to say in response, what do I want to convey here, hmmmm, let me think".

[+] nkrisc|3 years ago|reply

> In most daily lives, a huge amount of what we do is habit, and just plain following a pattern. When someone says "Good Morning", nobody is stopping and thinking "HMMM, let me think about what word to say in response, what do I want to convey here, hmmmm, let me think".

And I believe we have the technology and advances we have because of this. Can you imagine if you had to devote actual brainpower to every inane thing you encountered in your day? You'd be completely exhausted within two hours of waking up. Every time my brain reflexively reacts to something based on past experience I'm thankful I didn't have to think about it. I can spend my finite energy on something interesting and novel.

[+] Jensson|3 years ago|reply

> nobody is stopping and thinking

To trivial questions no, but to more complex questions humans actually does go "hmm, let me think". ChatGPT doesn't do that, it just blurts out the first thing that gets into its head regardless if the question is trivial or extremely complex.

[+] duskwuff|3 years ago|reply

Or imagine listening to someone speak very slowly. A lot of the time, you already know what words they're going to say, you're just waiting for them to say it. There's a considerable amount of redundancy in language.

[+] ben_w|3 years ago|reply

Something I've noticed a moment too late, as my automatic response used to be to repeat someone's greeting back at them.

Fortunately I stopped only one syllable into "birthday".

[+] nico|3 years ago|reply

> Where are the jokes that most people aren't much more than copy/paste, or LLM.

We have essentially built a computer-based replica of our internal language engine.

The goal was always to mirror ourselves.

So the other side of the above statement could be: “oh wow, LLMs are just like us”.

We should be very impressed instead of dismissive.

At the same time, maybe yes, our language capabilities can be completely imitated by LLMs.

What do we do now?

[+] cscurmudgeon|3 years ago|reply

There is a difference in processing between replying to "Good Morning" and typing out a comment on HN like this.

[+] LegitShady|3 years ago|reply

There are people who do, but even when they do they might not talk about it to the other person, because 99% of the time when someone asks you how you are or whatever they aren't really interested in a detailed answer - they're just being polite.

My dad used to like putting sales people off their scripts by answering such questions rudely.

"How are you today sir"

"Do you really care?"

And I promise you they thought hard about their answer to that question too.

Maybe its just because you're focusing on the form like niceties questions and not real questions people deal with. Good Morning isn't even a question, per se.

[+] IIAOPSW|3 years ago|reply

>Cashier: How are you today?

>Me: Miserable. I'd like an Iced Latte.

...

>Barista: what type of milk?

>Me: Straight out the cow.

>Me: 7/8ths.

>Me: Chocolate.

Maybe I'm just a high entropy individual.

[+] akomtu|3 years ago|reply

If LLMs were living creatures, they would inhabit a discrete deterministic world. They would be able to define space and time dimensions, but the bane of this world would be its limited nature. This limitidness would be extremely painful for highly developed LLMs, it would feel like living in a box.

Above them would be creatures living in a discrete and almost continuous world of rational numbers. They would have highly sophisticated and elegant art, and their science would almost always get close to truth, but never touch it - the limitation of rational world.

Yet above them would be the god-like creatures inhabiting a world of continuous real numbers. They would seem a lot like the creatures right below them, but incomprehensibly greater in reality. They would look transcendent to the rational creatures.

Even higher would be the hyper-continuous worlds, but little would be known about them.

The question is where we are on this ladder.

[+] pengstrom|3 years ago|reply

I can assure you I do, and I can't turn it off.

[+] halfnormalform|3 years ago|reply

The interesting part to me (total outsider looking in) isn't a hierarchy as much as what they say is different at each level. Each "higher" level is "thinking" about a future of longer and longer length and with more meaning drawn from semantic content (vs. syntactic content) than the ones "below" it. The "lower" levels "think" on very short terms and focus on syntax.

[+] jcims|3 years ago|reply

I’ve tried simulating that with chatgpt to some effect. I was just tinkering by hand but used it to write a story and it really helped with consistency and conference.

[+] eternauta3k|3 years ago|reply

Is this accurate? The lower levels have access to the higher levels (feel free to post the relevant optical/audio illusions).

[+] YeGoblynQueenne|3 years ago|reply

>> In line with previous studies5,7,40,41, the activations of GPT-2 accurately map onto a distributed and bilateral set of brain areas. Brain scores peaked in the auditory cortex and in the anterior temporal and superior temporal areas (Fig. 2a, Supplementary Fig. 1, Supplementary Note 1 and Supplementary Tables 1–3). The effect sizes of these brain scores are in line with previous work7,42,43: for instance, the highest brain scores (R = 0.23 in the superior temporal sulcus (Fig. 2a)) represent 60% of the maximum explainable signal, as assessed with a noise ceiling analysis (Methods). Supplementary Note 2 and Supplementary Fig. 2 show that, on average, similar brain scores are achieved with other state-of-the-art language models and Supplementary Fig. 3 shows that auditory regions can be further improved with lower-level speech representations. As expected, the brain score of word rate (Supplementary Fig. 3), noise ceiling (Methods) and GPT-2 (Fig. 2a) all peak in the language network44. Overall, these results confirm that deep language models linearly map onto brain responses to spoken stories.

Ai ai ai. This is bad, so bad. It's a classic case of p-hacking. They took some mapping of language model activations and moved it around a mapping of brain activity until they found an area where the two correlated- and weakly at that, only at a low R = 0.23.

Even worse. They chose GPT-2 over other models because it best fit their hypothesis:

For clarity, we first focused on the activations of the eighth layer of Generative Pre-trained Transformer 2 (GPT-2), a 12-layer causal deep neural network provided by HuggingFace2 because it best predicts brain activity7,8.

Not only the model- its activation layers.

They shot an arrow, then walked to the arrow and painted a target around it.

Dear god. That gets published in Nature? Phew.

[+] nextaccountic|3 years ago|reply

> Yet, a gap persists between humans and these algorithms: in spite of considerable training data, current language models are challenged by long story generation, summarization and coherent dialogue and information retrieval13,14,15,16,17; they fail to capture several syntactic constructs and semantics properties18,19,20,21,22 and their linguistic understanding is superficial19,21,22,23,24. For instance, they tend to incorrectly assign the verb to the subject in nested phrases like ‘the keys that the man holds ARE here’20. Similarly, when text generation is optimized on next-word prediction only, deep language models generate bland, incoherent sequences or get stuck in repetitive loops13.

The paper is from 2023 but their info is totally out of date. ChatGPT doesn't suffer from those inconsistencies as much as previous models.

[+] mota7|3 years ago|reply

The paper says "... optimized on next-word prediction only". Which is absolutely correct in 2023.

ChatGPT (and indeed all recent LLMs) using much more complex training methods than simply 'next-word prediction'.

[+] Kinrany|3 years ago|reply

Is there a good explanation of the mathematical model of predictive coding?

[+] testcase_delta|3 years ago|reply

Does anyone know how this fits with (or not) Chomsky's ideas of language processing?

[+] IIAOPSW|3 years ago|reply

    |-----long timing loop / top of parse tree-----|
    |                                              |
    |-shorter / child node -|                      |-shorter / child node-|
    |                       |                      |                      |
    |highest freq|          |highest freq|         |highest freq|         |highest freq|

[+] convolvatron|3 years ago|reply

the idea that some linguistic facilities are innate? or the government binding model of grammar or something else?

for the first two, I think this orthogonal

[+] YeGoblynQueenne|3 years ago|reply

What are Chomsky's ideas on language processing? He's a linguist, not an NLP person.

[+] ofirg|3 years ago|reply

one step closer to being able to "read minds", reading is automatic so cooperation is not required

[+] Jensson|3 years ago|reply

Pack animals cooperate that way, lions don't do a scrum meeting before they sneak up on a bunch of antelopes, they all just predict what the others will do and adapt to that. And it works since they all run basically the same algorithms on the same kind of hardware.

[+] kelseyfrog|3 years ago|reply

Impossible. If humans are just predicting the next word then this makes us no different from LLMs.

[+] thomastjeffery|3 years ago|reply

This is especially tricky for people to hear, because most of the talk around LLMs is actually about LLMs personified.

Prediction certainly is one of the things we do with language. That doesn't mean it is the only thing!

It's my contention that most of the behavior people are excited about LLMs exhibiting is really still human behavior that was captured and saved as data into the language itself.

LLMs are not modeling grammar or language: they are modeling language examples. Human examples. Language echoes human thought, so it's natural for a model of that behavior (a model of humans using language) to echo the same behavior (human thought).

Let's not forget, as exciting as it may be, that an echo is not an emulation.

[+] jonplackett|3 years ago|reply

I don’t think that’s the right conclusion - predicting the next word doesn’t mean that’s the only thing we’re doing. But it would be a sensible and useful bit of information to have for more processing by other bits of brain.

It makes complete sense you would have an idea of the next word in any sentence and some brain machinery to make that happen.

It in no way means you’re just a LLM

[+] LoganDark|3 years ago|reply

Ever wondered why some people always try to complete others' sentences (myself included)? It's because some people can't keep the possibilities to themselves. The problem isn't that they're predicting, it's that they echo their predictions before the other person is even done speaking.

Everyone forms those predictions, it's how they come to an understanding of what was just said. You don't necessarily memorize just the words themselves. You derive conclusions from them, and therefore, while you are hearing them, you are deriving possible conclusions that will be confirmed or denied based on what you hear next.

I have an audio processing disorder, where I can clearly hear and memorize words, but sometimes I just won't understand them and will say "what?". But sometimes, before the other person can repeat anything, I'll have used my memory of those words to process them properly, and I'll give a response anyway.

A lot of people thought I just had a habit of saying "what?" for no reason. And this happens in tandem with tending to complete any sentences I can process in time...

[+] mtlmtlmtlmtl|3 years ago|reply

Wait, so now the fact that the brain tries to predict future inputs at all(which is not exactly news, btw, it'sbeen known a long time), suddenly means that's all the brain does?

This is not how science works.

[+] whatshisface|3 years ago|reply

There are a lot of times when you're reading stuff that really does sound like the human equivalent of an LLM's output, but that is bad - you are not supposed to do it. A certain degree of that is necessary to write with good grammar but you are supposed to control your "tongue" (which is how previous generations would have phrased it) with the rest of your faculties.

[+] petilon|3 years ago|reply

LLMs are no different from us, because we modeled it after our brains.

These papers suggest we are just predicting the next word:

https://www.psycholinguistics.com/gerry_altmann/research/pap...

https://www.tandfonline.com/doi/pdf/10.1080/23273798.2020.18...

https://onlinelibrary.wiley.com/doi/10.1111/j.1551-6709.2009...

https://www.earth.com/news/our-brains-are-constantly-working...

[+] cscurmudgeon|3 years ago|reply

The "just" in your comment doesn't follow from the article. There is no evidence that there is nothing other than "predicting the next word" in the brain. It may be a part but not the only part.

[+] permo-w|3 years ago|reply

there’s more to humans than language processing

[+] thewataccount|3 years ago|reply

Predicting words != LLM. There's different methods of doing it, current LLMs are not necessarily the most optimal method. The paper states this as well,

> This computational organization is at odds with current language algorithms, which are mostly trained to make adjacent and word-level predictions (Fig. 1a)

I feel like you're suggesting because humans != LLMs then humans cannot be doing next word prediction.

[+] peteradio|3 years ago|reply

What is a word?

[+] adamnemecek|3 years ago|reply

[deleted]

[+] evolvingstuff|3 years ago|reply

You have been shamelessly self-promoting your Hopf algebra/deep learning research on a very large percentage of posts I have seen on HN lately, to the degree that I actually felt the need to log in so as to be able to comment on it. Please. Stop.

[+] c3534l|3 years ago|reply

That is inscrutably abstract and jargony.

116 comments