Norvig vs. Chomsky and the Fight for the Future of AI

[+] knowtheory|13 years ago|reply

It's a little bit frustrating to read a rehash of an argument that was cutting edge maybe back in the late 90s, especially one that is so poorly written, and framed as a battle between two intellectuals.

Chomsky's past his heyday. He has been seminal in his field, but he's no longer doing research which pushes at the boundaries of our understanding of language, how to model it, or what the fundamental nature of language understanding systems is. (as one might infer, I come from a non-chomskyian school of linguistics).

Given that we have actual data and research about large scale systems that do interesting things (including the massive artificial neural network that google built last month, see: http://www.wired.com/wiredscience/2012/06/google-x-neural-ne... ) reporting as substance free and obfuscating as this is, is a real frustration, when we could be talking about more interesting things, such as what a solid operational definition of meaning is, or how exactly heuristic/rule based systems actually differ from statistical mechanism, and whether or not all heuristic systems can (or should) be modeled with statistical systems.

The framing of this article is particularly galling because there are so many non-chomskian linguists out in the world who operate fruitfully in the statistical domain. Propping Chomsky up as somehow representative of all linguists is pretty specious and a bit irritating.

[+] ntoshev|13 years ago|reply

It's a rehash of http://norvig.com/chomsky.html and the talk by Chomsky that led to it (both recent and relevant). I can't wait to read the continuation of that article that Norvig promised in comments.

[+] mcguire|13 years ago|reply

Non-Chomskian linguists? I was under the impression that, post-Chomsky, linguistics was defined as Chomskian; everyone else had left the field for whatever related discipline most closely matched what they wanted to do.

[+] gbog|13 years ago|reply

> we could be talking about more interesting things

Please do.

[+] phaedrus|13 years ago|reply

I spent about ten years working on Markov based chat programs. I gave up on themwhen I realized that no matter how sophisticated your statistical model it will never be more than a statistical analysis of text, unless it includes some rich rule based model of mental processes and mental objects. It may be that such a model of mental processes must itself be fuzzy and probabilistic, but it must exist. Therefore I come down firmly on the side of Chomsky in this debate: we should pursue theories of intelligence, and stastical models without any theory do not advance our scientific understanding of AI, however practical their application may be at the present time. This is not to say statistical methods do not work, of course they work, what I am saying is it is not a path that leads to true understanding of intelligence any more than spectral analysis of the EMF emissions of a running computer would lead to a theory of computation.

[+] knowtheory|13 years ago|reply

Just because you chose Markov chains as your modeling mechanism doesn't mean that there is no statistical modeling method that is capable of developing something passing for what we'd call "meaning".

This is the same argument that was used against artificial neural networks. Neural network of type A can't do X, therefore neural networks will never do Y.

Language is immensely complex, and real human language involves things which are not encoded in text (and i'd remind you that you were trying to infer meaning from text specifically, not the full multi-channel robustness of humans communicating), we don't even have a full handle on what all of the cognitive processes and factors are that go into the production and understanding of language (although we've developed a lot of interesting work to those ends).

So hearing folks give up claim that Chomsky is correct because our current tools aren't up to the job is a bit puzzling, because we don't even have a complete understanding of what sort of thing language is or what sorts of things we are as systems which can use language.

Chomsky has opinions (and some facts) about what language is, and we are, but he does not have solid proof to confirm his specific conjectures. Is human language context free? context sensitive? Something else? (Chomsky's minimalist program uses movement along a tree to preserve referentiality and a bunch of junk, alternative syntactic frameworks such as HPSG uses directed graphs as the basis of their language modeling. Still others do weirder things like higher order combinatoric logics. And unfortunately none of the theoretical frameworks appear to be without their drawbacks)

[+] JabavuAdams|13 years ago|reply

Two experiences have made it clear to me that humans don't understand language that well, without context:

(1) Raising a child. My dad often remarks that he's surprised that my daughter knows how to use a word in just the right context. I'm not, because this is a natural product of mimicry: if you copy what others say, you usually use the words in the correct context. As with computer-generated text the exceptions are often hilarious.

(2) Song lyrics. I had a very clear experience where I just could not understand the refrain of Gold Guns Girls. It sounded completely unintelligible until I read the lyrics. After that, it sounded crystal clear. Why would reading the lyrics make the song sound different? Context.

[+] slurgfest|13 years ago|reply

There is no valid argument leading FROM your disenchantment with Markov based chat programs, TO a conclusion that machine learning is invalid. Markov chatters are a toy.

Did you produce a better chat bot based on UG? - No? Then on what basis are you junking machine learning?

Machine translation is far from a solved problem. But Chomsky's school claimed they were going to solve it in the 60s or perhaps the 70s. Do you know what is the basis for the most successful current approach to machine translation?

Analysis of text. (But not the kind of simplistic junk one does in a Markov chatter)

All you have done is suggest that some beautiful perfect text model exists natively in every person. (Presumably this evolved somehow - or if you are Chomsky, it just developed like a crystal for no apparent reason). This isn't an explanation of anything unless you actually find that model instantiated in the brain. But this is just not happening. So either our instruments are still too crude to detect it, or it's not really there.

Appealing to an as-yet unknown perfect universal text model does not build a better chat bot or a better explanation of human behavior.

True understanding of intelligence must incorporate an understanding of how learning occurs. Because anyone who watches children sees learning occurring, and only doctrinaire Chomskyists deny that it occurs (because it is not beautiful enough and some abductive argument is claimed to show that it is not sufficient).

[+] bhickey|13 years ago|reply

The fact that Markov chains are by definition memoryless isn't an argument in favor Chomsky or magical thinking. Sure, if you want to improve your output you can use (n+1)grams instead of n-grams, but the curse of dimensionality is going to quickly catch up with you. Language smoothing will help for a little while. Over a long enough horizon all Markov chain output is jibberish. None of these obvious limitations are an argument against statistical models.

Where is the data that statistical methods don't 'advance our understanding'? What does an EEG tell us about the brain works?

[+] bwanab|13 years ago|reply

It strikes me that the theories are not mutually incompatible at all. They have very different purposes. Chomsky is trying to understand meaning and intelligence at a deep level. Norvig is trying to build models that help people right now (and incidently, help his company to make more money). Any new insights from either path will help refine the other.

So it sounds like what you decided was that you wanted to explore one path and not the other. Nothing wrong with that, but it's a very different statement.

[+] dave_sullivan|13 years ago|reply

As far as language modeling, this is a recent paper that models language on the character level rather than word level and can track long term dependencies and even generate plausible sounding non-words from time to time: http://www.cs.toronto.edu/~ilya/pubs/2011/LANG-RNN.pdf

The state of the art is improving a bit, although this method still knows nothing of meaning so it can often generate some strange sentences. Still, I wouldn't write off the whole field yet--just because something didnt work with tools years ago doesn't mean it isn't possible.

[+] guscost|13 years ago|reply

There are two possible ways to invent this kind of theory, the way I see it:

1.) Measure data, then make an educated guess. 2.) Make an uneducated guess.

[+] robg|13 years ago|reply

This is one of those rare moments in intellectual life where being in the room and now seeing the debate develop, it becomes clear that the resulting hype isn't (wasn't) loud enough.

This distinction marks the real turning point in AI from abstract, grand claims with highly restrictive evidence toward engineering that simply works. Who cares about the ontology when we can recreate? It's like saying airplanes don't properly explain flight because they don't replicate how birds do it. Who cares? We can fly (and translate and soon reason) artificially.

It's clear that Chomsky and Universal Syntax has held back the entire field of AI (and at MIT). There isn't one algorithm in the human mind to decode all of our mental capabilities. That's mistaking subjectivity for objective lessons. Trying to recreate that Phantom has led to rule tables in AI, constraints on how the mind must operate. Instead, by allowing those fuzzy boundaries to accumulate with evidence, statistical approaches win in the long-term of our lives and in this debate.

Kuhn knew what happens to dinosaurs.

[+] _delirium|13 years ago|reply

I don't think I would take that strident battle-against-dinosaurs view, in part because I think intellectually understanding things is useful in and of itself, not some kind of "just build it and shut up" anti-intellectual view; and also because I don't think it's an accurate summary of the history of AI. There's no particular reason we can't both build and study things in various ways, and the history of AI has been full of people doing many takes on both.

In particular, statistical approaches have been used for a long time, but were not practical until fairly recently; it was the lack of "big data" computing power holding them back more than anything. Statistical machine translation and parsing experiments have been tried on and off for decades, but with 1950s-era data they produced total garbage as output, even worse than the (also bad) symbolic approaches. Hence why Shannon's work on text processing didn't produce practical NLP or NLG systems. It took Google-sized data to produce statistical translation that was actually usable.

What numerical approaches were possible on computers of the time were fairly extensively investigated when they became possible (e.g. the 1980s focus on "sub-symbolic AI", with perceptrons, neural networks, numerical regression methods, etc.). Some were shelved for years because they just didn't work as well, e.g. symbolic game-tree search massively outperformed machine learning in board games in the early experiments, which is why Samuels's 1950s ML-based checkers player was theoretically intriguing but not considered very practical.

[+] JabavuAdams|13 years ago|reply

> There isn't one algorithm in the human mind to decode all of our mental capabilities.

Careful here. There could be one algorithm, but we may not be able to express its parameters in the way we like to for simple models.

These statistical approaches are algorithms. It's the fact that we can't make sense of the parameters that seems to lead people to believe that they don't explain anything.

The fact that we can't categorize all life in an unambiguous way that makes sense to humans doesn't mean that the elegant and simple algorithm of evolution by natural selection is wrong.

Model vs. initial conditions. Knowledge vs. search.

[+] pnathan|13 years ago|reply

"Who cares about the ontology when we can recreate? It's like saying airplanes don't properly explain flight because they don't replicate how birds do it. Who cares?"

Well, I do. Being able to understand something from first principle is very different than being able to model it and gain an approximation of it.

One might point out that this debate crops up in a different form in developer circles, but reframed as "Learn assembly & cs theory vs. use IDE & ignore assembly and theory".

[+] fusiongyro|13 years ago|reply

> It's clear that Chomsky and Universal Syntax has held back the entire field of AI

That may be true, but there is value to understanding language for the sake of language outside the practical goals of improving AI. There is plenty of evidence for parameters playing a role in human language whether or not a parametric implementation of NLP is possible, desirable or necessary. It's certainly true that a statistical approach that bears plenty of fruit in AI applications is going to be very strained to provide anything of value back to linguistics or developmental psychology.

[+] meric|13 years ago|reply

"It's like saying airplanes don't properly explain flight because they don't replicate how birds do it."

The analogy doesn't hold. We don't want planes to fly like birds, but we may want machines to thinking like humans.

[+] rm999|13 years ago|reply

I've been in machine learning/AI for ten years now - from undergraduate research, to graduate school, to industry - and I find debate like this fascinating. My take on it is that our understanding of what we will be able to do in the future is very unclear, and what we will want to do is very open-ended. So the debate is worth having, but it won't really resolve anything.

Statistical models may (in my opinion probably will) end up being an "AI" dead-end, eventually falling into other fields such as algorithms, like game trees and logic-based agents did. That's not to say the current statistical approach is a bad idea; on the contrary, I think these techniques are useful and simple enough that they will become fairly ubiquitous in CS.

On the Chomsky side of the argument, AI researchers have consistently been frustrated in the past 50 years, to the point that studying AI today makes you sound like a joke. But their goal is a noble one. Anyone can understand how great it would be to have a human-level intelligence on a chip - this would fundamentally change the World. The fact that we haven't dented this problem doesn't mean the problem isn't worth solving, it just means our understanding of what it takes to build this kind of AI is in its infancy.

I almost feel like Norvig and Chomsky are arguing in parallel. They are both right, but their arguments are valid on different time scales. Today, the Norvig approach will easily win out; Chomsky has nothing and is largely irrelevant. But Chomsky is, IMO, correctly predicting what will need to happen to move beyond an eventual roadblock in a much grander AI.

[+] debacle|13 years ago|reply

They have two different definitions of "artificial intelligence," which is where the schism seems to be arising from.

Chomsky takes the academic approach - artificial intelligence is the simulation of humanlike (or even possibly mammalian) intelligence.

Norvig is taking the engineering approach - artificial intelligence needs only to pass the Turing test.

They're both right, both approaches have value, and they both are bound by our limited technology at the moment.

In the end, though, Norvig will lose out. Sure, he'll make the finish line first - an AI capable of 'passing' the Turing test, but in order to have real intelligence you need an analytical engine (or brain, if you will) that can prioritize data without fiddling with bits. In the Norvig solution, someone will always have to be fiddling with the bits.

Chomsky's approach, on the other hand, will result in a 'true' artificial intelligence, the way neurologists understand it. It's just going to take a lot longer to get there.

[+] phuff|13 years ago|reply

Having studied Chomsky a fair bit in grad school, and also studied cognitive linguistics a fair bit in grad school, I think the idea that Chomsky's models will ever win anything is just wrong.

Chomsky's central problem is that his modeling is not based on anything biological at all. His models don't correspond to reality. Some of them were based on some assumptions about how the brain works that were untestable in the 50s and 60s when all of his linguistic models were developed, and have since become testable and are not particularly evident in the way we currently understand the brain to work.

Given this fact, I think your current best bet is Norvig as a modern approach to AI or anything linguistic-y. But this is only because it is slightly more grounded in reality rather than being something that Chomsky (who is a very smart guy) came up with on his own without the benefit of actual biological models of the brain.

In the end, I think there will be (eventually, a long time from now) an actual model of how the brain processes language based on actual observations of working brains that throws away much of what Chomsky has proposed but probably uses some of it and that doesn't use huge Google-esque lookup tables but is highly influenced by statistics.

But until we get to that point, statistics are probably your best bet since at least they're grounded in reality (unlike much of Chomsky's work).

[+] bugsbunnyak|13 years ago|reply

> Chomsky's approach, on the other hand, will result in a 'true' artificial intelligence, the way neurologists understand it. It's just going to take a lot longer to get there.

High-level behavioral impressions taken by a neurologist are a convenient abstraction. That this high-level behavior is useful in monitoring mental state (outputs) says very little about the underlying 'hardware'. In fact, this is the `fundamental` debate in cognitive science: from whence does intelligence arise? Theories generally fall under two headings: 'top-down' and 'bottom-up', which roughly correspond to 'pre-programmed' and 'emergent'. The canonical bottom-up approach is the neural network, approximating cells with various equations that govern behavior (outputs) based on aggregate input (there are various levels at which this can be done). There are a variety of top-down approaches, a typical approach would take the form of logic engines (think Prolog), or generative rules (Chomsky)

Statistical modelling approaches are closer to bottom-up, but depending on the model they may still incorporate domain knowledge that is emergent from the model input.

Statistical approaches have momentum these days due to considerable success - thanks largely to Moore's law. However, they also have biological support: what is a neuron? It's an FPGA with a lot of electrical and chemical inputs. Small neural circuits can behave statistically, and it's an open question whether this gives rise to high-level behavior. A big reason it's an open question is that we don't yet have the spatial or temporal resolution to measure enough signals.

That said, there is plenty of room for what I consider a happy medium: locally statistical behavior, but globally (and generationally) top-down organization driven by genetics.

[+] _delirium|13 years ago|reply

A significant proportion of the "academic approach" is actually a third one: artificial intelligence is the analysis and implementation of rational decision-making. That approach tends to care neither about biological accuracy, nor believability in a Turing-test sense. Rather, it cares about whether its decisions are correct based on evidence available to the decision-maker. That's the kind of attitude you most often find in both statistical and logic-based AI circles.

Actually, Russell & Norvig's AI textbook has a nice summary of these different approaches to AI in its intro chapter.

[+] netcan|13 years ago|reply

I'm going out on a limb because I don't really know much about neurology & I may be wrong about facts but.. I think an issue here is that we don't really know what "natural intelligence" is.

For a significant part of the Scientific age we knew about genes in some sense without knowing much about them. We called them traits, observed & measured them. We got to know some "rules" about their inheritance. But it wasn't until genetics got to be a little better understood that we got to know their physical manifestation. We can explain the diference between genetic & cultural (memetic?) inheritance in these terms. A descendant's cooking habits are memetic and her hair colour is genetic.

When it comes to neuroscience I think we're where we were a century ago in biology. Emotions, thoughts, memories. We don't know what their physical manifestation is. We dont know how they work. Since we don't know much about how natural intelligence works I think our common sense definition of intelligence is, to a certain extent: "stuff we can do that computers can't."

I think that if we had a definition that was more functional than observational, you wouldn't be hesitant at all to use "mammalian" in your definition. Whatever processes result in observed human intelligence are almost certainly shared with other species. We'd probably also know what species to draw the lines at: reptiles? invertebrates? fungi?

If apes have intelligence, goldfish don't but octopi do that suggests there multiple versions of natural intelligence.

[+] power|13 years ago|reply

“Norvig is taking the engineering approach - artificial intelligence needs only to pass the Turing test.”

Passing the Turing test and the simulation of intelligence are supposed to be the same thing. Turing came up with the test to sidestep the definition of intelligence.

I’m not sure what you mean by “but in order to have real intelligence you need an analytical engine (or brain, if you will) that can prioritize data without fiddling with bits. In the Norvig solution, someone will always have to be fiddling with the bits”. Do you mean bits as in pieces or as in ones and zeros ? If the former, haven’t Chomsky’s models required the addition of new parameters as exceptions to his rules are found ? If you mean the latter, how is a computer to prioritize data if it can’t look at bits ?

There seems to be the misconception that Norvig does not use simple models. He does, just ones that use statistics for training and to learn. His approach strikes me as elegant, simple and more robust to changes in language over time than Chomsky’s.

[+] jan_g|13 years ago|reply

But what exactly is true artificial intelligence? For example, I consider Google search and Wolfram alpha very intelligent. They can do math, answer questions, rank information, follow current events, ...

[+] btilly|13 years ago|reply

I personally believe that it will be possible to simulate a working brain in sufficient detail that it is able to "think" like a human does before humans understand said brain.

In fact I doubt that the cognitive capacity of a human brain is enough to truly understand an operating human brain.

[+] rm999|13 years ago|reply

This is exactly my take on it. They are talking about AI in different contexts, and therefore aren't really arguing with each other as much as past each other. Anyone who has any interesting in building something today would take a Norvig approach, and anyone who pictures AI 100 years from now should hope that the Chomsky approach eventually won out.

[+] olalonde|13 years ago|reply

Isn't there a consensus that "passing the Turing test" <=> "true artificial intelligence"? Otherwise, what test do neurologists propose?

[+] azakai|13 years ago|reply

First thing, please read the actual article by Norvig, it is excellent,

http://norvig.com/chomsky.html

Second: I found it astounding that the article never mentions Skinner. Surely this article is trying to do to Chomsky what Chomsky did to Skinner in 1959 ("A Review of B. F. Skinner's Verbal Behavior", http://www.chomsky.info/articles/1967----.htm ).

Chomsky basically marked the beginning of modern era of cognitive psychology with that essay, displaing the previous paradigm of behaviorism. Norvig's article has similar form in some ways to that article, and similar goals (to argue for a new paradigm over an older one). As I was reading it, I was sure Norvig had that context in mind. So I was surprised to read

> So how could Chomsky say that observations of language cannot be the subject-matter of linguistics? It seems to come from his viewpoint as a Platonist and a Rationalist and perhaps a bit of a Mystic

Well, no, Chomksy explained very well why he opposed observations being the subject matter of linguistics in his 1959 essay. Skinner's behaviorism looked only at observations and experience, and did away entirely with internal mental states. That might seem bizarre to us today, and the reason is in large part the shift heralded by Chomsky's article from behavioral psychology to cognitive psychology. In the latter, the goal is to understand the internal processes that are involved in psychology (or specifically language).

Statistical language models are not behaviorism. But they do share a lot with it, they are based primarily on raw empirical observations as opposed to deep models, so it is natural for Chomsky to oppose them on similar grounds (and not due to Platonism or Rationalism, although I suppose you can speculate that those motivated his 1959 essay too).

Side note, we can speculate that if Skinner had today's computers and statistical modelling methods, the shift from behaviorism to cognitivism might never have happened, seeing as the statistical approach is so successful.

[+] orbitingpluto|13 years ago|reply

I know a card counter. I showed him how to condition probabilities to determine how to best play. He went for the full Monte Carlo method and he lets his simulation run for a week before he starts using it "just to make sure". It's frustrating because he doesn't get that his results are statistically significant after about 30 seconds of runtime. He still makes money doing it. The results are tangible, but he's still just mucking about.

'Quantum mechanics is certainly imposing. But an inner voice tells me that it is not yet the real thing. The theory says a lot, but does not really bring us any closer to the secret of the "old one." I, at any rate, am convinced that He does not throw dice.' --Einstein

Statistical methods can work but they are unsatisfying to the scientifically curious. You're not really a scientist if you create something that works and you don't really know why. (Not to say that the method doesn't have value. Sometimes you have to play with your Lego before you grow up.)

[+] VikingCoder|13 years ago|reply

I picture Chomsky as Kepler, trying to build orbits out of Platonic solids.

Until Kepler had access to Brahe's data, he was not going to be able to come up with his theories of planetary motions.

Worse than that, the laws of planetary motion present a simplistic view of the universe: what happens when a bunch of small objects orbit a very massive object. I think they wouldn't help you out at all, in trying to understand planets moving in a binary star system.

There is no analytic solution to the N-body problem. We can only simulate the motions of a group of massive bodies by iteratively applying the laws of gravitation that we have deduced. Knowing the mathematical properties of how objects behave in a gravitational field, and actually understanding HOW GRAVITY WORKS are two enormously different things. Newton was frustrated with the theory of Gravity, because it was, as Norvig's models, just a model - with no explanation of why. But the model allows you to make falsifiable predictions, and understand how the universe will behave. Looking for the Higgs Boson is awesome - but there is potentially no equivalent in the linguistic world.

Chomsky asks us to ignore F = G * m1 * m2 / r^2, because there's no WHY attached to it.

PS - this understanding of the history of science is brought to you by Carl Sagan's Cosmos TV series. I have no deeper insight than that.

[+] mootothemax|13 years ago|reply

Isn't this basically an argument over John Searle's Chinese Room thought experiment?

It supposes that there is a program that gives a computer the ability to carry on an intelligent conversation in written Chinese. If the program is given to someone who speaks only English to execute the instructions of the program by hand, then in theory, the English speaker would also be able to carry on a conversation in written Chinese. However, the English speaker would not be able to understand the conversation. Similarly, Searle concludes, a computer executing the program would not understand the conversation either.

http://en.wikipedia.org/wiki/Chinese_room

[+] sp332|13 years ago|reply

The argument is whether a computer can learn language (well) from scratch, or whether some capacity for language must be built into the computer manually. https://en.wikipedia.org/wiki/Poverty_of_the_stimulus

[+] knowtheory|13 years ago|reply

Just as a note, the space of possible responses to Searle's argument have been pretty well enumerated here: http://plato.stanford.edu/entries/chinese-room/

I'm of the opinion that the room has an understanding entity inside of it, in talking about infinitely-sized books with an infinitely-sized index allowing any mechanical process to map an input to a correct output, you've hypothesized something complicated enough that it should be said to be an entity capable of understanding/meaning.

[+] debacle|13 years ago|reply

The problem with Searle's assertion is that he is making a distinction between the computer and the program. We, as human beings, are not our computers, we are our programs.

[+] nessus42|13 years ago|reply

> Isn't this basically an argument over John Searle's Chinese Room thought experiment?

No, this debate is completely and utterly different from Searle's Chinese Room argument! Searle's argument is a philosophical one for the assertion that a computer could never be a person. He concludes that this is true even if we were to eventually believe that we completely understand intelligence in the manner than Chomsky is lobbying for, and then fully implement that understanding in a computer.

For Searle, no amount of understanding of intelligence in any form will ever let us make an intelligent computer. For Searle, intelligent beings must be made out of flesh and bone. Or at least not out of anything digital and computer-like.

[+] zeteo|13 years ago|reply

Is a robot capable of running? Let's say you had one, then take his legs and give them to an amputee. The amputee can operate the legs by pressing a button. However, pressing a button is not running. Similarly, the robot would not really be running either.

[+] brudgers|13 years ago|reply

Intellectually, there seems to be something as wrong with avoiding anthropomorphism when discussing human endeavors (such as language) as there is with anthropomorphic explanations of erosion or chemical reactions. Skinnarian approaches to language may leave people unsatisfied because there is no story, just clinical observation.

Norvig's approach (as characterized in the article) takes the the "Artificial" in "Artificial Intelligence" to include the mechanism by which an intelligence makes decisions. Chompsky's aesthetic of linguistics applied to AI would treat "Artificial" as a description of the platform in which an intelligence is embodied (i.e. non-biological) while requiring the platform to operate linguistically on the same principles as a "natural intelligence."

Norvig's approach (as characterized in the article) is essentially a better Eliza (or Ford's faster horse).

If one takes the Turing Test as scientifically meaningful rather than an engineering standard, then one falls in one camp or the other and the Norvig Chompsky debate is over a pseudo-problem. "Artificial Intelligence" is in that sense metaphysical jargon.

[+] slurgfest|13 years ago|reply

Skinner's book Verbal Behavior was mostly unsatisfying because it didn't have a lot of data; it really just laid out a research program which had not been carried out in any significant way (and now, never will be). Of course it is also unsatisfying that Skinner does not appeal to our sense that we already understand everything important about psychology and language "from the inside" and don't really need any stinking data.

The reason most people are unsatisfied with Skinner's approach to language is that they did not read Verbal Behavior, but rather Chomsky's review; and because Chomsky chose it (as among Skinner's weakest work) and reviewed it in the most uncharitable way possible, without understanding any of the basic concepts or motivations to Skinner's approach.

So, for example, he successfully associates Skinner directly with Watson, and makes it out that "radical" behaviorism is radical not for its rejection of premises of classical behaviorism but for being even more crazy.

That review is a masterpiece of propaganda and it effectively prevented Skinner's basic ideas from even being seriously evaluated ever again.

[+] Jun8|13 years ago|reply

OK, let me start with two facts, one objective, one personal: (i) Noam Chomsky is a genius with many contributions to linguistics and computer science (ii) I think his overall influence had been damaging to linguistics.

Here's a summary of Chomsky's career in layman's terms: As everyone knows, Chomsky first came to prominence with his critique of Skinner (who, as everyone also knows, was a total psycho). He pretty much created linguistics as we know it (at least in the US, there were some numbskulls in Europe who still doubted the new order), starting from the main thesis of linguistic universals, which can be summarized as the fact that all humans possess the same language faculty, i.e. the wide range of linguistics differences between, say, English and Mandarin are just on the surface. This was a welcome relief against the Sapir-Whorf mumbo-jumbo which held that Eskimos had hundreds of words of snow and language constrained how we think. Chomsky has also been very active in politics (he's actually much better known to the general world by his political books), pointing out the evils especially of the American brand of capitalism (is there any other kind?) and its corrosive influence on the world, e.g. Iraq, Afghanistan, etc. He also points out errors in certain approaches in Economics, e.g. see http://en.wikiquote.org/wiki/Noam_Chomsky#Capitalism, without holding a degree in the field, but everybody does that.

Chomsky's greatly damaging influence to linguistics is due to the fact that his speculative and simplistic (at least originally) views on how the brain processes and learns language has stifled research in promising fields by decades. The main problem I have with him is that the cause of the shortcomings of his theory seems to be not lack of knowledge (very little was known about cognition in the 60s), which, of course handicaps all pioneers of science, but politics (I detest politically motivated scientific theories). AFAIK, his universalist views were motivated from his political beliefs.

Luckily, starting in the 90s, Chomsky's chokehold on linguistics has slipped somehow. Researchers, such as Leda Cosmides, have ventured into research on linguistic relativity (http://en.wikipedia.org/wiki/Linguistic_relativity). Skinner's theories are making a comeback in academic circles (http://www.theatlantic.com/magazine/archive/2012/06/the-perf...).

So, what does all this mean for the current debate? I think it's time to retire and the "old guard"! Let us acknowledge their breakthroughs, their contributions, but also their limitations and move on.

[+] PaulHoule|13 years ago|reply

Well, in the big picture, Chomsky created an activity which keeps liguists very busy. His approach, however, has contributed very little to language engineering.

[+] mcguire|13 years ago|reply

Historically, AI has been divided into two related but different approaches. "Strong" AI is interested in understanding and creating Minds; figuring out what intelligence is, how it works, how we do it, and how it could be done in general. "Weak" AI is interested in doing things that couldn't be done before; things that we do not have good algorithms for, or don't have any algorithms at all.

Those two are not opposed. Any advance on either side helps the other. In this argument, Norvig is representing an extreme version of weak AI since he seems to be arguing that it's possible that statistical methods are all there is. (I suspect that he isn't actually making that argument, though, but that strong AI's models are currently too simplistic to capture what statistical approaches can do.) Chomsky, on the other hand, seems to be caricaturing strong AI by saying that anything that doesn't directly shed light on the Grand Theory is worthless.

[+] aidenn0|13 years ago|reply

It's a question about engineering vs science. Before Kepler, people actually could predict the motion of the stars and planets through the sky; perhaps not as elegantly or accurately as after Kepler, but to a certain degree, so what?

The AI case is clearly a point where the theories from linguistics are insufficient for engineering purposes. Watson could not have been built today based off of Chomskian linguistics. Maybe the statistical models will advance the theory of linguistics, maybe not. Either way they will give us useful tools now which is better than elegant tools later.

[+] frobbin|13 years ago|reply

AI research, including speech recognition and machine vision, are currently ENGINEERING disciplines trying to make artifacts that do interesting things. Success is an artifact that works.

Several basic science disciplines are trying to understand how brains work. There is mostly tremendous amounts of experimental facts, difficult to put together, and some theory and modelling to go with it.

Norvig would be confused if he thinks that engineering AI systems automatically counts as models useful for understanding the brain. If there is application to understanding brains it is a welcome accident. It happens that there are signals in basal ganglia that look like the temporal difference error signal from reinforcement learning. So maybe RL research can help understand some brain circuitry in that case.

But in general the engineers are trying to get stuff to work, and they are deluded if they think they are simultaneously making progress in understanding how brains work.

EDIT:

For example: why does speech recognition use hidden markov models and N-gram language models? Because they're the best model of how brains understand speech? No! Not at all. HMMs and N-gram models are above all computationally tractable. Easy to implement, not too slow to run.

We have algorithms (such as baum-welch and N-gram smoothing techniques) to get them work work well in engineering applications. Nothing more. Might they help us understand brains? Maybe, but not at all necessarily so.

[+] aangjie|13 years ago|reply

Just for the record, i consider this a simple model. And it's from norvig. http://norvig.com/spell-correct.html

[+] fat_clown|13 years ago|reply

It is an interesting debate, though I think it's being shone in the wrong light.

According to the article, it almost sounds like Chomsky believes a statistical approach to AI is a disservice to the field. The point he's missing is that research in statistical based AI is just that - statistics research.

Chomsky and Norvig deal in two different fields, which happen to have similar applications. Norvig does research in statistical and machine based learning. Success in this field comes from a new model that can make more accurate predictions, or a proof that it is impossible to make valid predictions about X with only Y as input. Applications of this field include technologies which rival AI systems as envisioned by Chomsky, but the essential point is that this field focuses on statistics research, not AI research.

Chomsky is wrong in dismissing this as a disservice. I do agree with his main point, that AI research and knowledge is not necessarily furthered by statistics research, but that is simply because they are different beasts entirely.

Maybe one day, when the biology has caught up with us and we have a solid understanding of the brain, will we be able to create a highly intelligent computer. Until then, statistics research is most likely to yield fruitful results.

[+] no_more_death|13 years ago|reply

One myth I want to debunk:

Copernicus's theory did NOT do away with epicycles. Search on Google for "copernicus epicycle" and the first article demonstrates my point. The one who did away with epicycles was Kepler. Copernicus believed orbits had to be perfectly circular; Kepler recognized that the data fit better into an elliptical model.

It's not 100% clear whether the author believed the "myth," but hopefully I can set some people straight in this forum.

[+] unknown|13 years ago|reply

[deleted]

[+] mbq|13 years ago|reply

The main problem with Chomsky's approach is that it is quite likely that human intelligence mechanics are just incomprehensible for a human intelligence, and not because of some crazy construction tricks but simply plain old brute size and complexity it imposes. Judging from much simpler (thus deeper investigated) biological systems like some bacteria metabolisms we can see that there is no grand design there, only trivial primitive core and numerous layers of less or more subtle modifiers of modifiers. IMO there is no reason why the same can't work for the brain and thus the "transition to sentience" is way more continuous than we would like to expect.

147 comments