top | item 36313278

Modern language models refute Chomsky’s approach to language

159 points| hackandthink | 2 years ago |scholar.google.com

235 comments

order
[+] neosat|2 years ago|reply
The author (bafflingly) seems to have completely missed the point- since anything they state up to page 15 (at which point I stopped reading) does not refute Chomsky's points at all. The author talks about LLMs and how they generate text and then goes on to talk about how it refutes Chomsky's claim about syntax and semantics. However it does not since Chomsky's primary claim is about how HUMANS acquire language.

The fact that you can replicate coherent text from probabilistic analysis and modeling of a very large corpus does not mean that humans acquire and generate language the same way. [edited page = 15]

[+] CydeWeys|2 years ago|reply
> The fact that you can replicate coherent text from probabilistic analysis and modeling of a very large corpus does not mean that humans acquire and generate language the same way.

Also, the LLMs are cheating! They learned from us. It's entirely possible that you do need syntax/semantics/sapience to create the original corpus, but not to duplicate it.

Let's see an AlphaZero-style version of an LLM, that learns language from scratch and creates a semantically meaningful corpus of work all on its own. It's entirely possible that Chomsky's mechanisms are necessary to do so.

[+] shaunxcode|2 years ago|reply
Thank you! It is like arguing that a human engaging in the creation of a landscape portrait using the traditional method of oil painting has been "refuted" by a computer generating vector graphics from statistical descriptions of the same scene. Both yield art but they are clearly different in interesting ways. Neither refutes or outmodes the other. Or maybe I'm wrong and actually trees have refuted mushrooms!
[+] lsy|2 years ago|reply
For people who don't understand this, the reason humans refer to "Alex" much later in a story is not because they are statistically recalling that they said "Alex" dozens or hundreds of words earlier (as the LLM is described doing in the paper), but because they have a world-model they are actively describing, where "Alex" refers to an entity in that world-model. We know that the LLM is only saying "Alex" because it appeared earlier, but we also know humans don't work like that, so how can the LLM's generation of language say anything about how humans acquire and use it?
[+] FabHK|2 years ago|reply
Chomsky: Birds fly by flapping their wings in a specific way while changing the angle in order to create lift and propulsion.

This paper: Planes fly, but don’t flap their wings, ergo Chomsky is wrong.

[+] taeric|2 years ago|reply
I wouldn't be shocked to find that humans don't learn from syntax and semantics, all told. We certainly aren't doing that with our kids, as they learn. And when they start picking up language, it is rapid and impressive. Note that it comes before they can speak, too. Seeing kids ability to understand some complicated directions when they can only do rudimentary sign language is eye opening.
[+] riku_iki|2 years ago|reply
> The fact that you can replicate coherent text from probabilistic analysis and modeling of a very large corpus does not mean that humans acquire and generate language the same way.

we actually don't know what is inside LM too, so it is possible LM statistically learns syntax and semantics, and it is major part of output quality.

[+] kristopolous|2 years ago|reply
It's kinda of like calling a hydraulic pump "mechanical muscle".

These types of "mistakes" are more about the authors letting their intentions and hopes known on how they wish the thing to be used.

[+] asdff|2 years ago|reply
Imagine being told all you need to do to learn Spanish, is to read a 300,000 word Spanish dictionary end to end so that you can probalistically come up with 1000 conversational phrases. Anyone who has learned a language can tell you it just doesn't work like that. You don't work by accumulating a massive dataset and training on it. No one can hold such a massive dataset of anything in their head at once.
[+] GuB-42|2 years ago|reply
We could use programming languages as a counterpoint.

LLMs can code in the same way they can use natural languages. But we know that programming languages have structure, we made them that way, from scratch, using Chomsky's theory no less.

Saying that because LLMs can learn programming languages using a different approach and therefore disprove the very theory they are built on is absurd.

Anyways, the paper is long and full of references, I didn't analyse it, does it include looks inside the model? For example, for LLMs to write code correctly, the structure of programming languages must be encoded somewhere in the weights of the model. A way to more convincingly disprove Chomsky's ideas would be to find which part of the network encodes structure in programming languages, and show that there is nothing similar for natural languages.

[+] adastra22|2 years ago|reply
It is far, far more likely that the way humans learn language resembles LLMs than it does Chomsky’s model, however.

Biology is intrinsically local. For Chomsky’s model of language instinct to work, it would have to reduce down to some sort of embryonic developmental process consisting of entirely of local gene-activated steps over the years it takes for a human child to begin speaking grammatical sentences. This is in direct contrast to most examples of human instinct, which disappear very quickly as the brain develops.

Really the main advantage that Chomsky’s ideas had is that no one could imagine how something simpler could possibly result in linguistic understanding. But large language models demonstrate that no, actually one simple learning algorithm is perfectly sufficient. So why evoke something more complex?

[+] guerrilla|2 years ago|reply
> I also respond to several critiques of large language models, including [...] skepticism that they are informative about real life acquisition

Yeah the whole thing hinges on this... and uh yeah good luck with that one...

[+] MrBuddyCasino|2 years ago|reply
Wordcels think LLMs imitate the human brain, when a shape rotator knows they really just imitate human language.
[+] gizmo686|2 years ago|reply
This paper completly misses the point of linguistics as a discipline that generative grammar operates in.

We have known that it is possible to understand language without innateness. That is what linguists do.

If you look at how linguists know about innate features, the answer is almost always by first discovering them explicitly while analysing language data; not by opening a brain to see what is innately inside. [0]

The point about innateness is that it takes generations of linguists to learn from a blank slate properties of language that children learn in just years.

There are also numerous other arguments for innateness. From the way humans seem to spontaneously develop language in a language deprives environment, to the way language aquasition works being more consistent with other innate behaviours, to the pressence of weird properties that seem to be present across languages for no apparent logical reason.

The only insight I see from LLM is the same insight we have seen throughout macine learning. It is not nessasary to understand something if you can throw enough compute at it. This is powerful, and it enables us to do a lot, but it should not be confused with understanding.

[0] There are some instances leveraging MRI and other cognative research teqniques to get some insight into the inner workings of human language processing, but their role in developing current linguistics theory is thus far limited.

[+] zvmaz|2 years ago|reply
Also, dogs and cats, and even our close relatives, primates, don't develop a capacity to language.
[+] bbor|2 years ago|reply
For a moment I was going to waste my afternoon arguing with people desperately predisposed to being the underdog in the fight against the Father of Linguistics, but you’ve said everything I ever could beautifully, so if this doesn’t help nothing will. Especially love the last paragraph, clarified that pattern for me.

On a lighter note, I do expect “Modern Language Models Refute…” to be the new “All you need is…”! It’s just too provocative not to click on

[+] zvmaz|2 years ago|reply
From the conclusion:

> First, the fact that language models can be trained on large amounts of text data and can generate human-like language without any explicit instruction on gram- mar or syntax suggests that language may not be as biologically determined as Chomsky has claimed. Instead, it suggests that language may be learned and developed through exposure to language and interactions with others.

I'm not a linguist nor a cognitive scientist, but this seems so problematic that I am not sure that I read it correctly. For example, how is the fact that language models "work" contradict the innateness of language in humans?

[+] meepmorp|2 years ago|reply
> For example, how is the fact that language models "work" contradict the innateness of language in humans?

It doesn't. Also, the author doesn't seem to actually understand Chomsky's writing about language, because learning language via exposure is how humans learn languages and he fucking mentions that in his writing on the subject.

UG (universal grammar) is the purported facility in the human brain which makes language possible - it has a innate structure, but it learns particular languages from exposure. Chomsky doesn't state exactly what that structure is because he doesn't know - figuring that out is goal of his work.

[+] taeric|2 years ago|reply
It contradicts the idea that you have to teach language using syntax and grammar. Which... I confess I thought was already not believed? We certainly aren't teaching kids in the home how to decline and conjugate words. Are we?

(Similarly, languages that have gender are typically just picked up by usage, not necessarily ingrained by reasoning. Which leads to the obvious bad results when people think that there was solid reasoning on those choices, in the first place.)

[+] mcguire|2 years ago|reply
While I think some of the points in the article are interesting, the usual evidence for the Chomskian approach is the relative lack of input data for learning language by children in the wild.

How much input data is used to train modern language models?

[+] famouswaffles|2 years ago|reply
1. No Language model is yet even close to the scale of the human brain

2. Depending on what exactly you're trying to teach (perfect grammar, paragraphs of coherent text, basic reasoning), much less data is needed. https://arxiv.org/abs/2305.07759

3. Brains don't start at 0. Evolution, dna/rna etc. There's obviously some pre disposition for language learning in humans but that alone isn't enough ground for a "universal grammar"

4. We really do take in an enormous amount of data (not text specifically)

[+] MAXPOOL|2 years ago|reply
I agree. Question has not been settled.

20 year old human has

* heard ~220 million words, talked 50 million words.

* read ~10 million words.

* experienced 420 million seconds of wakeful interaction with the environment (can be used to estimate the limit to conscious decisions, or number of distinct 'epochs' we experience)

From a machine learning perspective human life is surprisingly small set of inputs and actions, just a blip of existence.

[+] api|2 years ago|reply
Do children really lack input data though? Human sensory input is quite a lot of data. Our languages may have common structure because that structure reflects causality and physics encountered by directly sampling the real world.
[+] michaelhartm|2 years ago|reply
Nobody knows how the LLMs work under the hood. It's just lots of stacked transformers that encode various concepts. Nothing in this book refutes whether Chomsky's concepts are actually being encoded in LLMs or not. For all we know, Chomsky's concept of "binding principles", "binary branching" etc could be represented inside the inner layers of these many billion parameter models. In fact, I'd argue that this is the right research to do. Prove that no transform or feed-forward layer inside the neural net encodes, say "binding principles".
[+] michaelhartm|2 years ago|reply
Btw, semantics and syntax is separated in the LLMs (the author is wrong). The embedding function (matmul) can map syntax and the proximity in the embedding (e.g. cosine similarity) is the semantics (that's attention). So not convinced. Chomsky might be wrong or right, but this author hasn't proven it.
[+] anthonybsd|2 years ago|reply
Why is everything these days revolve around ChatGPT(etc). You don't need LLMs to refute Chomsky language models. Modern linguistics pretty much rejected [1] his theories on the basis of evidence.

[1] https://www.scientificamerican.com/article/evidence-rebuts-c...

[+] bbor|2 years ago|reply
Thanks for posting, finally some support for his supposed debunking! Interesting reading for sure.

  That work fails to support Chomsky’s assertions. The research suggests a radically different view, in which learning of a child’s first language does not rely on an innate grammar module.

  Instead the new research shows that young children use various types of thinking that may not be specific to language at all—such as the ability to classify the world into categories (people or objects, for instance) and to understand the relations among things. 

  These capabilities, coupled with a unique human ability to grasp what others intend to communicate, allow language to happen. 
The fact that very smart people think this refutes Chomsky makes me quite sad. They basically restated the UG theory in the last sentence, as proof that it’s wrong…

Chomsky has been saying for literal decades that language is likely a corollary to the basic reasoning skills that set humans apart, but people still think UG means “kids are born knowing what a noun is” :(

[+] csb6|2 years ago|reply
Certainly there has been a shift of many applied linguistics researchers away from generative linguistics, but it is still quite common among university linguistics departments and continues to be actively researched (source: took linguistics courses in college a couple of years ago).
[+] comboy|2 years ago|reply
Tangentially related, but it's interesting that Chomsky stated in a few interviews that he seems LLMs as just plagiarism machines, that they don't create anything new. Which I disagree with - us being creative is also just colliding patterns together. But at the same time I kind of assign higher value to his opinion than mine..
[+] z3c0|2 years ago|reply
These are two different modes being conflated. A person committing plagiarism is akin to a how a GPT creates a document. "Okay, this word... and then this word... and then this word..."

This is opposed to modeling a concept in your mind, and then applying language through denotation. This isn't unlike composing a request to be sent over a specific protocol. The data exists independently of the protocol and could even be fit to more protocols, with the right understanding of how to implement them. Sure, you have to read some docs, and maybe use a library somebody else wrote, but nobody in their right mind would call that plagiarism. This is more akin to how language works in the human brain, where each new language is like a different protocol.

[+] why-el|2 years ago|reply
> First, the fact that language models can be trained on large amounts of text data and can generate human-like language without any explicit instruction on grammar or syntax suggests that language may not be as biologically determined as Chomsky has claimed

"The fact that this advanced drone released in the year 2080 that can drive with the same agility as an eagle proves that the eagle's flying ability is not as biologically determined as some people claim.

In fact, any organism can fly if it sees enough data about flying!"

[+] effed3|2 years ago|reply
I remember, some time ago, someone (more than one, to be precise) used NN (neuralnets) to resolve some math problems (differential equations, IIRC, i have lost the refs). Suddently some others stated the death of Mathematics, and in general, of Science as we all know, just pour all data in a NN and solution will appear, no more theory, models, hypothesis, experiments,.. needed. Nothing of this happened, but from time to time this -hallucination- return, missing the difference between Science and Technology, between a machine that work (or appear to work) and a model/theory that explain how/why.
[+] dogcomplex|2 years ago|reply
Savage, but maybe fair. If one of Chomsky's underlying claims is indeed that language requires innate hard parsing rules and can't just be derived from probabilistic sampling of a bunch of data - that seems completely dead in the water.

It is entirely likely that the way we operate is probability-first, only deriving rules loosely after taking in lots of experiential data to speed up and simplify that initial fake-it-til-you-make-it understanding. The fact LLMs can get the quality we see using just this approach is a strong indicator that this method of understanding may be a fundamental approach of biological systems too.

(and if you're arguing this is unfair because humans created the language that's being used for probabilistic training - well, look at image models trained on photographs instead and tell me those aren't an example of extreme quality derived purely from mass-inferenced data. Rules-based architectures don't necessarily need apply.)

But honestly, this seems like a silly claim to begin with if it really was claimed. We have formal language theory complexity classes of probabilistic algorithms for a reason - they work! It shouldn't be surprising that the model can stretch down to the fundamentals too. Far fewer programmers (and linguists) were raised to think with these models than deterministic rules-based ones, but the field has been progressing alongside for decades, and now they get to play with powerful LLMs that take probabilistic inferencing to the extreme and will likely prove it works (very elegantly) for everything. This shouldn't be surprising in retrospect.

Chomsky may very well be right that there always exists some fundamental elegant formula underlying any phenomenon (or at least any language). But it's undeniable at this point that simplistic statistical approaches can be applied at scale to those phenomenon and derive highly-useful general models, which also will very likely converge upon the elegant formulas he envisioned. The two are intrinsically linked, neither inseparable.

[+] Guidii|2 years ago|reply
I'm unconvinced.

The author seems to be stating that because llms have generated "interesting" statements that all of Chomsky's approaches are invalidated. This seems to assume that the llm isn't somehow incorporating Chomsky's generative grammar system into the model, which would be a very hard thing to prove.

Also, the author's stated proof contains flaws. Take for example page 16:

Generate ten other sentences like "colorless green ideas sleep furiously": Purple fluffy clouds dream wildly. Blue glittery unicorns jump excitedly.

The author claims that the llm has completed this task successfully, completely ignoring the example's pairwise incompatible terms (colorless green, and sleep furiously), and accepting that "purple fluffy" is equivalently meaningless. It is not, and the model has clearly failed.

[+] JoshTko|2 years ago|reply
Language is essentially compression. And studies show humans automatically compress data relative to chimps. So there are likely biological structures that help humans compress concepts. Compression could also be an emergent property of having more layers in a NN.
[+] PeterCorless|2 years ago|reply
The LLMs don't understand anything. At all. They have rules they respond to. But that doesn't mean they actually "get" a single word of what they're blathering.

A computer has no sense of "intent" or "meaning" to what they have indexed and scanned. There is no intent or meaning to what they spit out in generated text.

We're back in the Chinese room. https://en.wikipedia.org/wiki/Chinese_room

[+] azakai|2 years ago|reply
The Chinese Room Argument is considered by many to be incorrect on those matters. But it has led to decades of interesting philosophical debate, which is worth reading to see the various perspectives. This is an excellent summary:

https://plato.stanford.edu/entries/chinese-room/

[+] pixelmonkey|2 years ago|reply
On Chomsky, here is a recording of some audio from Chomsky where he uses a rhetorical argument for why language may be innate to humans rather than (fully) learned. From 1992. Just 5 minutes.

https://youtu.be/CPgDALpS-7k

Let's make sure what the computer scientists understand Chomsky to have stated is actually aligned. Chomsky didn't say the ONLY way to create language is via the brain. His view, instead, is that evolution programmed language development into the brain -- that it is not learned (entirely) by peer osmosis. That the brain has some structure for language, built-in, which is the unlocked in various ways via socialization.

Summary of Chomsky's view, paraphrased:

"It is a strange intuition [that most other people have]. Above the neck, we insist everything [in human development] comes from experience. Below the neck, we're willing to accept the idea that [...] it comes from inside. [...] But: it is hard to look at the Sun setting and say, it's not 'setting', the Earth is actually turning. Similarly, with people, it's hard for us to look at them and not see them as minds inside bodies. This leads us to this false approach: below the neck, we are willing to pursue the sciences, and if that leads us to believe development is internally programmed, we'll accept it. But above the neck, we'll be completely irrational; we're going to insist on beliefs and explanations we'd never normally dream of in other rational areas."

---

This YouTube clip came to mind, but here is a more detailed explainer from Stanford Encyclopedia of Philosophy:

"Clearly, there is something very special about the brains of human beings that enables them to master a natural language — a feat usually more or less completed by age 8 or so. ... This article introduces the idea, most closely associated with the work of the MIT linguist Noam Chomsky, that what is special about human brains is that they contain a specialized ‘language organ,’ an innate mental ‘module’ or ‘faculty,’ that is dedicated to the task of mastering a language.

On Chomsky's view, the language faculty contains innate knowledge of various linguistic rules, constraints and principles; this innate knowledge constitutes the ‘initial state’ of the language faculty. In interaction with one's experiences of language during childhood — that is, with one's exposure to what Chomsky calls the ‘primary linguistic data’ or ‘pld’ — it gives rise to a new body of linguistic knowledge, namely, knowledge of a specific language (like Chinese or English). This ‘attained’ or ‘final’ state of the language faculty constitutes one's ‘linguistic competence’ and includes knowledge of the grammar of one's language. This knowledge, according to Chomsky, is essential to our ability to speak and understand a language (although, of course, it is not sufficient for this ability: much additional knowledge is brought to bear in ‘linguistic performance,’ that is, actual language use)."

source: https://plato.stanford.edu/entries/innateness-language/

[+] bbor|2 years ago|reply
Fantastic comment - it never occurred to me to look for an article on the topic on the Stanford encyclopedia! Good stuff, as always.
[+] aap_|2 years ago|reply
It sure is an interesting time for linguistics now. For the first time there now is a second entity that can understand and produce language. Will be exciting to see how the field will transform.
[+] zvmaz|2 years ago|reply
> For the first time there now is a second entity that can understand and produce language

But does it really understand language? This reminds me of the Chinese room argument [1].

[1] https://en.wikipedia.org/wiki/Chinese_room

[+] adamnemecek|2 years ago|reply
I'm betting on learning (both human and ML) being analogous to renormalization in QFT. Essentially, you are trying to explain a process in terms of imperfect patterns. With incoming data, you update both your patterns and your locations where these patterns occur at once.

There is some research into this https://phys.org/news/2022-05-renormalization-group-methods-...

[+] renewiltord|2 years ago|reply
I only skimmed a few pages, but it doesn't appear to be about the Chomsky hierarchy - which was what I was curious about.
[+] cma|2 years ago|reply
Deep Mind had this paper on that, showing weaknesses of transformers relative to other models in generalizing from example productions from different tiers of the hierarchy:

https://arxiv.org/abs/2207.02098