Recent Advances in Natural Language Processing

[+] rvense|5 years ago|reply

I think the point about language being a model of reality was interesting. I have an MA in linguistics including some NLP from about a decade ago and was looking at a career in academic NLP. I ultimately left to become a programmer because (of life circumstances and the fact that) I didn't see much of a future for the field, precisely because it was ignoring the (to me) obvious issues of written language bias, ignorance of multi-modality and situatedness etc. that are brought up in this post.

All of these results are very interesting, but I'm not really feeling like we've been proved wrong yet. There is a big question of scalability here, at least as far as the goal of AGI goes, which the author also admits:

> Of course everyday language stands in a woolier relation to sheep, pine cones, desire and quarks than the formal language of chess moves stands in relation to chess moves, and the patterns are far more complex. Modality, uncertainty, vagueness and other complexities enter but the isomorphism between world and language is there, even if inexact.

This woolly relation between language and reality is well-known. It has been studied in various ways in linguistics and the philosophy of language, for instance by Frege and not least Foucault and everything after. I also think many modern linguistic schools take a very different view of "uncertainty and vagueness" than I sense in the author here, but they are obviously writing for non-specialist audience and trying not to dwell on this subject here.

My point is, when making and evaluating these NLP methods and the tools they are used to construct, it is extremely important to understand that language models social realities rather than any single physical one. It seems to me all too easy, coming from formal grammar or pure stats or computer science, to rush into these things with naive assumptions about what words are or how they mean things to people. I dread to think what will happen if we base our future society on tools made in that way.

[+] joe_the_user|5 years ago|reply

My point is, when making and evaluating these NLP methods and the tools they are used to construct, it is extremely important to understand that language models social realities rather than any single physical one.

I'd claim actual involves language both a general shared reality and quite concrete and specific discussions of single physical and logic facts/models. Some portion of language certainl looks mostly like a stream of associations. But within it is also are references to physical reality and world model and the two are complexly intertwined (making logical sense is akin to a "parity check" - you can for a while without it but then you have to look at the whole to get it). I believe one can see this in a GPT paragraph, where the first two sentences seem intelligence, well written but the third sentence contradicts the first two sentences sufficiently one's mind isn't sure "what's being said" (and even here, our "need for logic" might be loose enough that we only notice the "senselessness" after the third logical error).

[+] skybrian|5 years ago|reply

These tools seem to be getting pretty good at fiction. In particular, playing around with AI Dungeon, it doesn't believe anything, or alternately you could say it believes everything at once. It's similar to a library that contains books by different authors, some fact, some fiction. Contradictions don't matter. Only local consistency matters, and even then, not that much.

Unfortunately, many people want to believe that they are being understood. But, on the bright side, this stuff seems artistically useful? Entertainment is a big business.

[+] hinkley|5 years ago|reply

It's probably worse than that.

It's a model of reality as filtered through the human brain, with all of its neuroses, most of which we have only begun to model.

Aren't we stuck on things like sarcasm? How are you going to model everything from confirmation bias to undisclosed PTSD, in order to have a prayer of filtering the noise from actionable information?

[+] fernly|5 years ago|reply

> it is extremely important to understand that language models social realities rather than any single physical one...

Oh gosh yes. Gorge the ML tool with the right streams of text and you'd have a glib anti-vax, white supremacist, flat-earth generator pouring out endless logorrheic twaddle.

Oh wait... could that already... it would explain so much...

[+] FiberBundle|5 years ago|reply

I found the science exams results interesting and skimmed the paper [1]. They report an accuracy of >90% on the questions. What I found puzzling was that they have a section in the experimental results part where they test the robustness of the results using adverserial answer options, more specifically they used some simple heuristic to choose 4 additional answer options from the set of other questions which maximized 'confusion' for the model. This resulted in a drop of more than 40 percentage points in the accuracy of the model. I find this extremely puzzling, what do these models actually learn? Clearly they don't actually learn any scientific principles.

[1] https://arxiv.org/pdf/1909.01958.pdf

[+] wrs|5 years ago|reply

I would be interested in hearing the results from humans presented with adversarial answer options. You may say that a machine learning correlations between words isn’t really learning science, but I wonder how many human students aren’t either, just pretty much learning correlations between words to pass tests...

[+] 0-_-0|5 years ago|reply

Adversiarial training means that they specifically search for answers that the network would misunderstand. If this only leads to a 40 percent loss (the network still answers correctly 50% of the time) I still consider that remarkable.

Choosing the best from 8 answers where some of them were adversarially derived should be equivalent to choosing the best from all possible answers, of which there could be tens of thousands. How would a human do in that situation?

Although the kinds of mistakes the network makes seem like a mistake you would never do (i.e. you wouldn't call the condition of air outdoors gradient), the opposite could also be true, that it would easily answer questions you would have a problem with.

[+] teej|5 years ago|reply

That’s the thing. Machine Learning is a misnomer, the models don’t “learn” anything about the domain they operate in. It’s just statistical inference.

A dog can learn to turn left or to turn right for treats. But they don’t understand the concept of “direction”, their brain isn’t wired that way.

Machine learning models perform tricks for treats. The tricks they do get more impressive by the day. But don’t be deceived, they aren’t wired to gain knowledge.

[+] MiroF|5 years ago|reply

Adversarial examples probably exist for humans too, they are just less easy to find since we can't easily backprop through human judgements like we can a bunch of array multiplications and dot products.

Doesn't mean that they are not "actually" learning any more than we are.

[+] rland|5 years ago|reply

> Models are transitive- if x models y, and y models z, then x models z. The upshot of these facts are that if you have a really good statistical model of how words relate to each other, that model is also implicitly a model of the world.

This right here is a great way of putting the success of GPT-3 into context. We think GPT is smart, because when it says something eerily human-like we apply our model of the world onto what it is saying. A conversation like this:

> Me: So, what happened when you fell off the balance beam?

> GPT: It hurt.

> Me: Why'd it hurt so bad?

> GPT: The beam was high up and I feel awkwardly.

> Me: Wow, that sounds awful.

In this conversation, one of us is thinking far harder than the other. GPT can have conversations like this now, which is impressive. But only I can model the beam, the fall, and the physical reality. When I say "that sounds awful," I actually do a miniature physics simulation in my head, imagining losing my balance and falling off a high beam, landing, the physical pain, etc. GPT does none of that. In either case, when it asks the question or when it answers it, it is entirely ignorant of this sort of "shadow" model that's being constructed.

Generalizing a bit, our "shadow" model of reality in every single domain is far more powerful than language's approximation. That's why we won't be able to use GPT to do a medical diagnosis or create a piece of architecture or whatever else people are saying it's going to do now.

[+] lmm|5 years ago|reply

> In this conversation, one of us is thinking far harder than the other. GPT can have conversations like this now, which is impressive. But only I can model the beam, the fall, and the physical reality. When I say "that sounds awful," I actually do a miniature physics simulation in my head, imagining losing my balance and falling off a high beam, landing, the physical pain, etc. GPT does none of that. In either case, when it asks the question or when it answers it, it is entirely ignorant of this sort of "shadow" model that's being constructed.

GPT-3 could presumably write a paragraph like that one. You can claim to have a working physics model in your head, but why should I believe that unless it becomes evident from the things that you communicate to me? I've certainly met humans who could have a superficially legitimate conversation about objects in motion while harbouring enormous misconceptions about the physics involved.

Maybe the biggest takeaway from GPT-3 should be that we should raise our standards for human conversation, demanding more precise language and giving less credit to flourishes that make the meaning ambiguous.

[+] dwohnitmok|5 years ago|reply

The author's quote is asserting the opposite of what you're saying here.

Filling in "GPT" for x, "text" for y, and "world" for z, the author is stating "if GPT models text, and text models the world, then GPT models the world."

In particular, the author directly addresses your point that "Generalizing a bit, our 'shadow' model of reality in every single domain is far more powerful than language's approximation."

> Modality, uncertainty, vagueness and other complexities enter but the isomorphism between world and language is there, even if inexact.

My own personal view tends to lean towards the author's. GPT-3 , despite its many astonishing achievements, has many many limitations. Yet I don't see any indication as of yet that any of those limitations are fundamental ones inherent to the approach behind the GPT series, which is why GPT-3 both excites and frightens me.

[+] ilaksh|5 years ago|reply

Right. I recently saw a podcast with Dileep George of Vicarious AI. I don't know if his specific techniques will work fully generally but at the high level he is talking about something quite similar to what you are saying in terms of grounding language understanding and having real models of the world. So I am definitely following him.

More broadly there is a growing group of researchers who are working on trying to achieve better world modeling (although of course there have been many for decades, it just seems that the number who are coming from deep learning towards AGI with more emphasis on world modeling is increasing).

Such as Ferran Alet. Sort of the Tenenbaum school I guess. Or Melanie Mitchell has been saying similar for quite awhile. They have recent talks on Good AI's YouTube channel.

[+] mqus|5 years ago|reply

Not a single mention if this is only applicable to english or to other natural languages. Afaict this mostly lists advancements in ELP (english language processing), Especially the Winograd schema (or ar least the given example) seems to be heavily focused on english.

Relevant article for this problem: https://news.ycombinator.com/item?id=24026511

[+] MiroF|5 years ago|reply

But there's no reason the models are english specific...

[+] skybrian|5 years ago|reply

Darn, based on the title, I was hoping for an overview of recent research.

Lots of people are having fun playing with GPT-3 or AI Dungeon, myself included, but it seems like there is other interesting research going on like the REALM paper [1], [2]. What should I be reading? Why aren't people talking about REALM more? I'm no expert, but it seems like keeping the knowledge base outside the language model has a lot going for it?

[1] https://ai.googleblog.com/2020/08/realm-integrating-retrieva... [2] https://arxiv.org/abs/2002.08909

[+] nl|5 years ago|reply

REALM is amazing, and Google's PEGASUS[1] taught itself to count between 2 and 5 (which is mindblowing).

Basically the NLP groups from Google and FB[2][3] are always worth watching.

[1] https://ai.googleblog.com/2020/06/pegasus-state-of-art-model...

[2] https://ai.facebook.com/blog/covost-v2-expanding-the-largest...

[3] https://ai.facebook.com/blog/introducing-a-new-large-scale-d...

[+] MiroF|5 years ago|reply

I think there's a pretty large gap between what people in academia NLP/industry labs find interesting and what the bitcoing/VC/whatever people on twitter seem to find interesting.

realm is definitely cool, and i agree that techniques like this are the way forward.

[+] YeGoblynQueenne|5 years ago|reply

Separate comment because separate point.

To my mind, all this benchmark-beating is just pointless faffing about and not a little frustrating to see that this is where the current state of the art is at. NLP systems keep getting better and better at beating benchmarks and yet we still don't have artificial agents that can hold an intelligent conversation, even in a restricted context.

Or rather, we had such programs since the late 1960's, with no one else's than Terry Winograd's own doctoral thesis system, SHRDLU [1], but it seems that in the race to build powerful classifiers, we forgot how to build systems that can interact intelligently with a human.

Here's a small excerpt from SHRDLU (taken from the wikipedia page on SHRDLU, originally in an appendix in Terry Winograd's thesis):

    Person: Is there a large block behind a pyramid?
    Computer: YES, THREE OF THEM: A LARGE RED ONE, A LARGE GREEN CUBE, AND THE BLUE ONE.
    Person: Put a small one onto the green cube which supports a pyramid.
    Computer: OK.
    Person: Put the littlest pyramid on top of it.
    Computer: OK.
    Person: Does the shortest thing the tallest pyramid's support supports support anything green?
    Computer: YES, THE GREEN PYRAMID.
    Person: What color is it?
    Computer: BY "IT", I ASSUME YOU MEAN THE SHORTEST THING THE TALLEST PYRAMID'S SUPPORT SUPPORTS.
    Computer: RED

Note that this is not just mindless text generation. The conversation is held in the context of a "blocks world" where a robot with a single hand and eye ("Computer" in the excerpt above) moves blocks of various shapes and colours around, as directed by a human user in free-form natural language. When the Computer says "OK" after it's directed to "put the littlest pyramid on top of it" it's because it really has grabbed the smallest pyramid in the blocks world and placed it on top of the small block in an earlier sentence, as the Person asked. The program has a memory module to keep track of what ellipses like "it", "one" etc refer to throughout the conversation.

SHRDLU was a traditional program hand-crafted by a single PhD student- no machine learning, no statistical techniques. It included, among other things, a context-free grammar (!) of natural English and a planner (to control the robot's hand) all written in Lisp and PLANNER. In its limited domain, it was smarter than anything ever created with statisical NLP methods.

______________________

[1] https://en.wikipedia.org/wiki/SHRDLU

[+] liuliu|5 years ago|reply

We knew hand-crafted program in limited domains can work for NLP, computer vision and voice recognition long time ago. The challenge is always, that the limited domain can be extremely limited, and to get anything practically interesting requires a lot of human involvement to encode the world (expert system).

Statistical methods traded that. With data, some labelled, some unlabelled and some weakly-labelled, we can generate these models with much more efficient human involvement (refine the statistical models and labelling data).

I honestly don't see the frustration. Yes, current NLP model may not be the "intelligent agent" everyone looking for yet to any extent. But claiming it is all faffing and no better than 1960s is quite a stretch.

[+] quotemstr|5 years ago|reply

Why is it surprising that a CFG can approximate a subset of English grammar?

[+] YeGoblynQueenne|5 years ago|reply

>> The Winograd schema test was originally intended to be a more rigorous replacement for the Turing test, because it seems to require deep knowledge of how things fit together in the world, and the ability to reason about that knowledge in a linguistic context. Recent advances in NLP have allowed computers to achieve near human scores:(https://gluebenchmark.com/leaderboard/).

The "Winograd schema" in Glue/SuperGlue refers to the Winograd-NLI benchmark which is simplified with respect to the original Winograd Schema Challenge [1], on which the state-of-the-art still significantly lags human performance:

The Winograd Schema Challenge is a dataset for common sense reasoning. It employs Winograd Schema questions that require the resolution of anaphora: the system must identify the antecedent of an ambiguous pronoun in a statement. Models are evaluated based on accuracy.

WNLI is a relaxation of the Winograd Schema Challenge proposed as part of the GLUE benchmark and a conversion to the natural language inference (NLI) format. The task is to predict if the sentence with the pronoun substituted is entailed by the original sentence. While the training set is balanced between two classes (entailment and not entailment), the test set is imbalanced between them (35% entailment, 65% not entailment). The majority baseline is thus 65%, while for the Winograd Schema Challenge it is 50% (Liu et al., 2017). The latter is more challenging.

https://nlpprogress.com/english/common_sense.html

There is also a more recent adversarial version of the Winograd Schema Challenge called Winogrande. I can't say I'm on top of the various results and so I don't know the state of the art, but it's not yet "near human", not without caveats (for example, wikipedia reports 70% accuracy on 70 problems manually selected from the originoal WSC).

__________

[1] https://www.aaai.org/ocs/index.php/KR/KR12/paper/view/4492

[+] bloaf|5 years ago|reply

I know that there are allegedly NLP algorithms for generating things like articles about sports games. I assume they have something more like the type signature (timeline of events) -> (narrative about said events)

What this article is about is more (question/prompt) -> (answer/continuation of prompt)

Does anyone know if there is progress in the (timeline of events) -> (narrative about said events) space?

[+] 082349872349872|5 years ago|reply

For an intermediate goal on the way to sports games, the financial press version of (timeline of events) -> (narrative about said events) could be tackled as a memoryless system.

[+] walleeee|5 years ago|reply

> A lot of the power of the thought experiment hinges on the fact that the room solves questions using a lookup table, this stacks the deck. Perhaps we be more willing to say that the room as a whole understood language if it formed an (implicit) model of how things are, and of the current context, and used those models to answer questions.

Some define intelligence (entirely separately from consciousness) precisely as the ability to develop an internal model. Coupled to a regulatory feedback the system can then modify itself in response to some set of internal and/or external conditions (Joscha Bach for instance suggests consciousness is a consequence of extremely complex self-models)

[+] ragebol|5 years ago|reply

> In my head- and maybe this was naive- I had thought that, in order to attempt these sorts of tasks with any facility, it wouldn’t be sufficient to simply feed a computer lots of text.

(Tasks here referring to questions in the New York Regent’s science exam)

Same for me.

But it makes sense of course that learning from text only is entirely possible. I certainly have not directly observed the answer to eg. 'Which process in an apple tree primarily results from cell division? (1) growth (2) photosynthesis (3) gas exchange (4) waste removal', I have been taught, from text books, what the answer should be.

I do have a much better grounding of what growth is, what apples and apple trees are though.

[+] _emacsomancer_|5 years ago|reply

A bit I found rather strange, on the language-side:

> This is to say the patterns in language use mirror the patterns of how things are(1).

> (1)- Strictly of course only the patterns in true sentences mirror, or are isomorphic to, the arrangement of the world, but most sentences people utter are at least approximately true.

Presumably this should really say something like "...but most sentences people utter are at least approximately true of their mental representation of the world."

[+] ascavalcante80|5 years ago|reply

NLP is great for many things, but, from my own experience as a NLP developer, machines are not even close to understand human language. They can interpret well some kind of written speech, but they will struggle to grasp two humans speaking to each other. The progress we are make on building chatbots and vocal assistants is mainly due to the fact We are learning how to speaking to the machines, and not the contrary.

[+] laurieg|5 years ago|reply

I find it a little bit strange that there is an unspoked assumption in almost all natural language processing: That speech and text are perfectly equivalent.

All of the examples in the article work on English text, not spoken English. I would consider spoken English to be a much better "Gold standard" of natural language.

I'm really looking forward to machine translation operating purely on a speech in/speech out basis, instead of converting to text as an intermediate step.

[+] rllin|5 years ago|reply

the thing is humans have most efficiently encoded (in detail) reality in text. humans already highlight what is worth encoding about reality.

for example, you can finetune gpt-2 to have an idea of sexual biology by having it read erotica. just like how you can have a model learn the same by watching porn. but it is much more efficient to read the text, since there is much less information that is "useless"

[+] p1esk|5 years ago|reply

Note this is pre-GPT-3. In fact I expect GPT-4 will be where interesting things start happening in NLP.

[+] curiousgal|5 years ago|reply

I honestly don't get where the big deal is with NLP. So far the most useful application has been customer support chatbots and those still don't rise to the level of having an actual human that can understand the intricacies of your special request.

[+] benibela|5 years ago|reply

Rather than a generator, I could use a good verifier, i.e., an accurate grammar checker

[+] narag|5 years ago|reply

Has it happened that a "thought experiment" has become a real experiment ever?

[+] dane-pgp|5 years ago|reply

Most historians think that this was actually a thought experiment:

https://en.wikipedia.org/wiki/Galileo's_Leaning_Tower_of_Pis...

An equivalent experiment was famously carried out for real in 1971 on the surface of the Moon.

[+] exo-pla-net|5 years ago|reply

Not sure exactly what you mean, but Einstein developed relativity largely through thought experiments. And relativity has been verified by real experiments.

[+] simonh|5 years ago|reply

In a sense every experiment starts as a thought experiment.

[+] plutonorm|5 years ago|reply

[deleted]

[+] jvanderbot|5 years ago|reply

I'd go one step further: Humans themselves don't understand anything, we are just good at constructing logical-sounding (plausible, testable) stories about things. These are mental models, and it's the only way we can make reasonable predictions to within error tolerances of our day-to-day experience, but they are flat-out lies and stories we tell ourselves not based on a high-fidelity understanding of anything.

Rumination, deep thinking, etc is simply actor-critic learning of these mental models for story-telling.

[+] soulofmischief|5 years ago|reply

Mental models are not lies.

"The car is red" is not a lie just because I didn't phrase it internally as "The car reflects photons of a frequency of around 700nm".

We have to be able to simplify and internalize simplified models in order to make any sense of anything. It's the same reason your eyes only have a focal point in the dead center: attention requires vast amounts of processing power.

To reiterate, a simplification is not a lie. Especially not a flat-out lie.

[+] runT1ME|5 years ago|reply

Do current NLP systems understand arithmetic, and can the do it with unfamiliar numbers they've never seen on? If not, I'd think that your theory is demonstratably false, as a child can extrapolate mathematical axioms from just a few example problems, whereas NLP models are not able to do so.

[+] bananaface|5 years ago|reply

I get what you're saying, but I think this is a side-effect of saving thinking in abstractions. We use abstractions to plug holes to avoid having to drill down on every concept we're exposed to.

That can look like a surface-level linguistic understanding, but it's not, it's a surface-level abstraction. It's not arbitrary, it has structure, and when you flesh it out you're fleshing it out with actual abstract structure, not just painting over the gaps with arbitrary language.

[+] cscurmudgeon|5 years ago|reply

But you claim to understand right in this comment how humans understand other things.

Isn't that self-contradictory?

176 comments