Announcing SyntaxNet: The World’s Most Accurate Natural Language Parser

[+] xigency|10 years ago|reply

Evidence that this is the most accurate parser is here; the previous approach mentioned is a March 2016 paper, "Globally Normalized Transition-Based Neural Networks," http://arxiv.org/abs/1603.06042

"On a standard benchmark consisting of randomly drawn English newswire sentences (the 20 year old Penn Treebank), Parsey McParseface recovers individual dependencies between words with over 94% accuracy, beating our own previous state-of-the-art results, which were already better than any previous approach."

From the original paper, "Our model achieves state-of-the-art accuracy on all of these tasks, matching or outperforming LSTMs while being significantly faster. In particular for dependency parsing on the Wall Street Journal we achieve the best-ever published unlabeled attachment score of 94.41%."

This seems like a narrower standard than described, specifically being better at parsing the Penn Treebank than the best natural language parser for English on the Wall Street Journal.

The statistics listed on the project GitHub actually contradict these claims by showing the original March 2016 implementation has higher accuracy than Parsey McParseface.

[+] weinzierl|10 years ago|reply

spaCy is another active open source (MIT) POS-tagger. In a previous discussion on HN[1] it was well received.

There is a simplified educational 200 lines python version [2] of it. It claims 96.8% for the WSJ corpus.

What am I missing here?

[1] https://news.ycombinator.com/item?id=8942783

[2] https://spacy.io/blog/part-of-speech-pos-tagger-in-python

[+] cbr|10 years ago|reply

    better at parsing the Penn Treebank than the best
    natural language parser for English on the Wall
    Street Journal

I'm pretty sure "the 20 year old Penn Treebank" and "the Wall Street Journal" are referring to the same dataset here. In the early 1990s the first large treebanking efforts were on a corpus from the WSJ, and they were released as the Penn Treebank: https://catalog.ldc.upenn.edu/LDC95T7 People report results on this dataset because that's what the field has been testing on (and overfitting to) for decades.

(I worked on a successor project, OntoNotes, that involved additional treebank annotation on broader corpora: https://catalog.ldc.upenn.edu/LDC2013T19)

[+] musesum|10 years ago|reply

> The statistics listed on the project GitHub actually contradict these claims by showing the original March 2016 implementation has higher accuracy than Parsey McParseface.

So you're referring to this LSTM?

"Andor et al. (2016)* is simply a SyntaxNet model with a larger beam and network. For futher information on the datasets, see that paper under the section "Treebank Union"."

After spending a few months hand coding a NLP parser, am rather intrigued by LSTM. I like the idea of finding coefficients, as opposed to juggling artificial labels.

[+] eitally|10 years ago|reply

Coincidentally, I had a parent/teacher conference with my 1st grader's teacher yesterday afternoon. Regarding reading level & comprehension, she remarked that current research indicates anything below about 98% comprehension isn't sufficient for reading "fluency". Before the past few years, the standard was 95% comprehension = fluency, but that extra few percentage points apparently make an enormous difference (probably because of colloquial & jargon edge case usages that carry specific meanings in specific contexts, but which aren't easy to programmatically detect, but that's just my supposition).

[+] unknown|10 years ago|reply

[deleted]

[+] unknown|10 years ago|reply

[deleted]

[+] wodenokoto|10 years ago|reply

The paper you mention is the world's best results and is macparseface with broader beam search and more hidden layers.

This is an opensourcing of the March 2016 method (syntaxnet, note that in the paper there are results from several trained models) as well as a trained model that is comparable in performance but faster (macparseface).

It is very hard to separate those two things from the way they write.

[+] teraflop|10 years ago|reply

This is really cool, and props to Google for making it publicly available.

The blog post says this can be used as a building block for natural language understanding applications. Does anyone have examples of how that might work? Parse trees are cool to look at, but what can I do with them?

For instance, let's say I'm interested in doing text classification. I can imagine that the parse tree would convey more semantic information than just a bag of words. Should I be turning the edges and vertices of the tree into a feature vectors somehow? I can think of a few half-baked ideas off the top of my head, but I'm sure other people have already spent a lot of time thinking about this, and I'm wondering if there are any "best practices".

[+] jlos|10 years ago|reply

This would be very interesting when applied to Biblical Studies. Any serious academic discussion of biblical texts will involve syntactical breakdown of the text being discussed. Most of the times the ambiguities are clear, but its still quite common for a phrase to have several possible syntactical arrangements that are not immediately clear. These ambiguities are also challenging becuase the languages are dead (at least as used in the biblical texts). So the type of ambiguity of "Alice drove down the street in her car" can lead to some significant scholarly disagreement.

I could see Parsey McParseface helping identifying patterns in literature contemporaneous to the biblical texts. Certain idiomatic uses of syntax, which would have been obvious to the original readers, could be identified much more quickly.

[+] syllogism|10 years ago|reply

Most of the really good applications are part of larger systems. Parsing is good in machine translation, for instance. You transform the source text so that it's closer to the target language. Parsing is also useful for question answering, information extraction, text-to-speech...

Here's an example of using information from a syntactic parser to decorate words, and create an enhanced bag-of-words model: https://spacy.io/demos/sense2vec

Here's a very terse explanation of using them in a rule-based way: https://spacy.io/docs/tutorials/syntax-search

[+] yolesaber|10 years ago|reply

This is actually really useful for a project I'm working on. I'm trying to detect bias in news sources using sentiment analysis and one of the problems I've run into is identifying who exactly is the subject of a sentence. Using this could be really helpful in parsing out the noun phrases and breaking them down in order to find the subject.

[+] CAPSLOCKENGAGED|10 years ago|reply

Here is an application of parse trees: sentiment analysis with recursive neural networks based on how components of the parse tree combine to create the overall meaning.

http://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf

They are useful as a preprocessing step for a lot of downstream NLP tasks. It shouldn't be hard to find more papers that take advantage of the tree structure of language.

[+] gipp|10 years ago|reply

The typical approach is something like a tree kernel (https://en.wikipedia.org/wiki/Tree_kernel). Looked into them briefly for a work project that never got off the ground, can't say too much about using them in practice.

[+] unknown|10 years ago|reply

[deleted]

[+] endtime|10 years ago|reply

> Parse trees are cool to look at, but what can I do with them?

One really simple and obvious thing is word sense disambiguation. Plenty of homonyms are different parts of speech (e.g. the verb "lead" and the noun "lead"). I'm sure there's lots of more sophisticated stuff you can do as well, but this might be the lowest-hanging fruit.

[+] bigiain|10 years ago|reply

Idea: point this at political speeches / security breach notifications / outage postmortems / etc, and rate them by how many ambiguities with starkly different dependancy parses there are... (Well of _course_ we mean the roads inside Alice's car when we made that commitment!)

[+] fpgaminer|10 years ago|reply

One of the projects I'd love to develop is an automated peer editor for student essays. My wife is an english teacher and a large percentage of her time is taken up by grading papers. A large percentage of that time is then spent marking up grammar and spelling. What I envision is a website that handles that grammar/spelling bit. More importantly, I'd like it as a tool that the students use freely prior to submitting their essays to the teacher. I want them to have immediate feedback on how to improve the grammar in their essays, so they can iterate and learn. By the time the essays reach the teacher, the teacher should only have to grade for content, composition, style, plagiarism, citations, etc. Hopefully this also helps to reduce the amount of grammar that needs to be taught in-class, freeing time for more meaningful discussions.

The problem is that while I have knowledge and experience in the computer vision side of machine learning, I lack experience in NLP. And to the best of my knowledge NLP as a field has not come as far as vision, to the extent that such an automated editor would have too many mistakes. To be student facing it would need to be really accurate. On top of that it wouldn't be dealing with well formed input. The input by definition is adversarial. So unlike SyntaxNet which is built to deal with comprehensible sentences, this tool would need to deal with incomprehensible sentences. According to the link, SyntaxNet only gets 90% accuracy on random sentences from the web.

That said, I might give SyntaxNet a try. The idea would be to use SyntaxNet to extract meaning from a broken sentence, and then work backwards from the meaning to identify how the sentence can be modified to better match that meaning.

Thank you Google for contributing this tool to the community at large.

[+] schoen|10 years ago|reply

I think this is still risky if used in a context where the student might think that the computer is somehow always right. Great English writers often deliberately use sentence fragments or puns, or use a word with a nonstandard part-of-speech interpretation (especially using a noun as a verb). They may also sometimes use a sentence that's difficult for readers to parse and then explain the ambiguity after the fact.

If a teacher gave students a grammar-checking tool to check their writing, they might assume that the tool knew better than they did, which is only sometimes true.

[+] kafkaesq|10 years ago|reply

Not sure how they work exactly, but have you looked at http://noredink.com (and as another commenter mentioned, http://grammarly.com/)? I'd be interested in your thoughts.

[+] tvural|10 years ago|reply

"A large percentage of that time is then spent marking up grammar and spelling."

As an aside, I don't think this is the optimal way to teach people how to write. What were the ideas in those papers? How were they organized? Do the student's arguments make sense? I think that's what most students spend most of their time thinking about when writing an essay, and it can be a bit demoralizing to see the teacher care just as much about whether the grammar was right. Most students can fix grammar mistakes relatively easily once they notice them anyway.

[+] danappelxx|10 years ago|reply

Doesn't Grammarly[0] already do this? It analyzes the input for common grammar mistakes and proposes ways to fix them. As a student, I occasionally use Grammarly to proofread a paper for me, and it has worked pretty well so far.

[0]: http://grammarly.com

[+] ilyaeck|10 years ago|reply

SyntaxNet is, by definition, for syntactic analysis - it would likely not help you much with semantics, to extract meaning. It could maybe help you automatically determine is a sentence is grammatically correct, though.

[+] fernly|10 years ago|reply

Such a checker could be a boon for students as well as instructors, but take note of this near the end of the article,

> This suggests that we are approaching human performance—but only on well-formed text.

It may fall down on exactly the bad writing you want to process. GIGO?

[+] jrgoj|10 years ago|reply

Now for the buffalo test[1]

`echo 'Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo' | syntaxnet/demo.sh'

   buffalo NN ROOT

   +-- buffalo NN nn

   |   +-- Buffalo NNP nn

   |   |   +-- Buffalo NNP nn

   |   |   +-- buffalo NNP nn

   |   +-- buffalo NN nn

   +-- Buffalo NNP nn

        +-- buffalo NNP nn

[1]: https://en.wikipedia.org/wiki/Buffalo_buffalo_Buffalo_buffal...

[+] nihonde|10 years ago|reply

How does it handle the "translation" from that wiki page?

Bison from Buffalo, which bison from Buffalo bully, themselves bully bison from Buffalo.

[+] deanclatworthy|10 years ago|reply

It's really nice to have access to these kinds of tools. I am sure some folks from Google are checking this, so thank you.

Analysis of the structure of a piece of text is the first step to understanding its meaning. IBM are doing some good work in this area. http://www.alchemyapi.com/products/demo/alchemylanguage

Anything in the pipeline for this project to help with classifying sentiment, emotion etc. from text?

[+] zappo2938|10 years ago|reply

Yes, we derive syntactic meaning from grammatical structure. It's one thing getting a machine to understand grammar and another to get a human to understand. If anyone is interested, Doing Grammar by Max Morenberg is an excellent source of knowledge about grammar.[0]He approaches grammar very systematically which is helpful if people want to train machines.

[0] http://www.amazon.com/Doing-Grammar-Max-Morenberg/dp/0199947...

[+] feral|10 years ago|reply

I'd love to hear Chomsky's reaction to this stuff (or someone in his camp on the Chomsky vs. Norvig debate [0]).

My understanding is that Chomsky was against statistical approaches to AI, as being scientifically un-useful - eventual dead ends, which would reach a certain accuracy, and plateau - as opposed to the purer logic/grammar approaches, which reductionistically/generatively decompose things into constituent parts, in some interpretable way, which is hence more scientifically valuable, and composable - easier to build on.

But now we're seeing these very successful blended approaches, where you've got a grammatical search, which is reductionist, and produces an interpretable factoring of the sentence - but its guided by a massive (comparatively uninterpretable) neural net.

It's like AlphaGo - which is still doing search, in a very structured, rule based, reductionist way - but leveraging the more black-box statistical neural network to make the search actually efficient, and qualitatively more useful. Is this an emerging paradigm?

I used to have a lot of sympathy for the Compsky argument, and thought Norvig et al. [the machine learning community] could be accused of talking up a more prosaic 'applied ML' agenda into being more scientifically worthwhile than it actually was.

But I think systems like this are evidence that gradual, incremental, improvement of working statistical systems, can eventually yield more powerful reductionist/logical systems overall. I'd love to hear an opposing perspective from someone in the Chomsky camp, in the context of systems like this. (Which I am hopefully not strawmanning here.)

[0]Norvig's article: http://norvig.com/chomsky.html

[+] mdip|10 years ago|reply

This looks fantastic. I've been fascinated with parsers ever since I got into programming in my teens (almost always centered around programming language parsing).

Curious - The parsing work I've done with programming languages was never done via machine learning, just the usual strict classification rules (which are used to parse ... code written to a strict specification). I'm guessing source code could be fed as data to an engine like this as a training model but I'm not sure what the value would be. Does anyone more experienced/smarter than me have any insights on something like that?

As a side-point:

Parsy McParseface - Well done. They managed to lob a gag over at NERC (Boaty McBoatface) and let them know that the world won't end because a product has a goofy name. Every time Google does things like this they send an unconscious remind us that they're a company that's 'still just a bunch of people like our users'. They've always been good at marketing in a way that keeps that "touchy-feely" sense about them and they've taken a free opportunity to get attention for this product beyond just the small circle of programmers.

As NERC found out, a lot of people paid attention when the winning name was Boaty McBoatface (among other, more obnoxous/less tasteful choices). A story about a new ship isn't going to hit the front page of any general news site normally and I always felt that NERC missed a prime opportunity to continue with that publicity and attention. It became a topic talked about by friends of mine who would otherwise have never paid attention to anything science related. It would have been comical, should the Boaty's mission turn up a major discovery, to hear 'serious newscasters' say the name of the ship in reference to the breakthrough. And it would have been refreshing to see that organization stick to the original name with a "Well, we tried, you spoke, it was a mistake to trust the pranksters on the web but we're not going to invoke the 'we get the final say' clause because that wasn't the spirit of the campaign. Our bad."

[+] Someone|10 years ago|reply

For those wondering: the license appears to be Apache 2.0 (https://github.com/tensorflow/models)

[+] syncro|10 years ago|reply

Dockerized version so you try without installing:

https://hub.docker.com/r/brianlow/syntaxnet-docker/

[+] TeMPOraL|10 years ago|reply

> Humans do a remarkable job of dealing with ambiguity, almost to the point where the problem is unnoticeable; the challenge is for computers to do the same. Multiple ambiguities such as these in longer sentences conspire to give a combinatorial explosion in the number of possible structures for a sentence.

Isn't the core observation about natural language that humans don't parse it at all? Grammar is a secondary, derived construct that we use to give language some stability; I doubt anyone reading "Alice drove down the street in her car" actually parsed the grammatical structure of that sentence, either explicitly or implicitly.

Anyway, some impressive results here.

[+] glup|10 years ago|reply

Various syntactic theories (HPSG, GPSG, minimalism, construction grammars) from linguistics are certainly derived constructs, but most researchers would agree that they all reflect real abstractions that humans make. I think the NLP community has good a job of harvesting the substantive aspects (which tend to be fairly conventionalized upon across theories) without overfitting on specific cases. "Alice drove down the street in her car" is easy for people to process, "The horse raced past the barn fell" is not, because it requires a pretty drastic reinterpretation of the structure when you get to the last word.

That said, there is some interesting work on "good-enough" language processing, which suggests that people maintain some fuzziness and don't fully resolve the structure when they don't need to. [1]

[1] http://csjarchive.cogsci.rpi.edu/proceedings/2009/papers/75/...

[+] zodiac|10 years ago|reply

No, the academic consensus is pretty much the opposite. For example by trying to rigorously state the way we form yes/no sentences in english - the process that converts "the man who has written the book will be followed" to "will the man who has written the book be followed?" instead of the incorrect "has the man who written the book will be followed?" - you will find that the rules must involve imposing some sort of tree structure on the original sentence. The fact that we do it correctly all of the time on sentences we've never seen before means that we must have parsed the original sentence.

(Example sentences taken from https://he.palgrave.com/page/detail/syntactic-theory-geoffre..., although any introductory linguistics/syntax textbooks will spend a few pages making the case that humans understand language by first parsing it into some kind of tree structure).

[+] bobwaycott|10 years ago|reply

I'm not sure about the claim on implicit lack of parsing structure. I read your example as who did what, where, in what. There must be some level of structural parsing and recognition so we understand it was Alice who drove in a car, that the car is owned by Alice, and that she, Alice, drove down the street, in her car. That we automatically understand all this seems to indicate some level of implicit parsing, right? Admittedly, it's been many years since I did any study of linguistics and language acquisition, so I'm pretty ignorant of the current state of knowledge here. Am I just layering my grammatical parsing atop an existing understanding that doesn't parse at all?

[+] andreasvc|10 years ago|reply

> I doubt anyone reading "Alice drove down the street in her car" actually parsed the grammatical structure of that sentence, either explicitly or implicitly.

You do need to analyze a sentence to understand it. Think of a classical attachment ambiguity such as "the boy saw the girl with the telescope". There are two readings of the sentence, and just like a Gestalt, you're typically perceiving it as one or the other. This involves a process of disambiguation, which is evidence that you have parsed the sentence.

[+] ohitsdom|10 years ago|reply

I'm sure it's only a matter of time before someone puts this online in a format easily played with. Looking forward to that

[+] xigency|10 years ago|reply

It's already available here - https://github.com/tensorflow/models/tree/master/syntaxnet

    echo 'Bob brought the pizza to Alice.' | syntaxnet/demo.sh

    Input: Bob brought the pizza to Alice .
    Parse:
    brought VBD ROOT
     +-- Bob NNP nsubj
     +-- pizza NN dobj
     |   +-- the DT det
     +-- to IN prep
     |   +-- Alice NNP pobj
     +-- . . punct

[+] bpodgursky|10 years ago|reply

I have a visualizer for CoreNLP that's OSS, would be easy to adapt: http://nlpviz.bpodgursky.com/

[+] rspeer|10 years ago|reply

I'm glad they point out that we need to move on from Penn Treebank when measuring the performance of NLP tools. Most communication doesn't sound like the Penn Treebank, and the decisions that annotators made when labeling Penn Treebank shouldn't constrain us forever.

Too many people mistake "we can't make taggers that are better at tagging Penn Treebank" for "we can't make taggers better", when there are so many ways that taggers could be improved in the real world. I look forward to experimenting with Parsey McParseface.

[+] weinzierl|10 years ago|reply

Say, I wanted to use this for English text with a large amount of jargon. Do have to train my own model from scratch or is it possible to retrain Parsey McParseface?

How expensive is it to train a model like Parsey McParseface?

[+] scarface74|10 years ago|reply

I started working on a parser as a side project that could parse simple sentences, create a knowledge graph, and then you could ask questions based on the graph. I used http://m.newsinlevels.com at level 1 to feed it news articles and then you could ask questions.

It worked pretty well but I lost interest once I realized I would have to feed it tons of words. So could I use this to do something similar?

What programming language would I need to use?

[+] jventura|10 years ago|reply

As someone who has published work in the NLP area, I always take claimed results with a grain of salt. With that said, I still will have to read the paper to know the implementation details, although my problem with generic linguistic approaches such as this one seems to be is that it is usually hard to "port" to other languages.

For instance, the way they parse sequences of words may or may not be too specific to the English language. It is somewhat similar to what we call "overfitting" in the data-mining area, and it may invalidate this technique for other languages.

When I worked on this area (up to 2014), I worked mainly in language-independent statistical approaches. As with everything, it has its cons as you can extract information from more languages, but, in general, with less certainties.

But in general, it is good to see that the NLP area is still alive somewhere, as I can't seem to find any NLP jobs where I live! :)

Edit: I've read it in the diagonal, and it is based on a Neural Network, so in theory, if it was trained in other languages, it could return good enough results as well. It is normal for English/American authors to include only english datasets, but I would like to see an application to another language.. This is a very specialized domain of knowledge, so I'm quite limited on my analysis..

[+] unknown|10 years ago|reply

[deleted]

[+] the_decider|10 years ago|reply

According to their paper (http://arxiv.org/pdf/1603.06042v1.pdf), the technique can also be applied to sentence compression. It would be cool if Google publishes that example code/training-data as well.

[+] neves|10 years ago|reply

Shouldn't the title be renamed for "The World's Most Accurate Natural Language Parser For English"?

It's impressive how Google's natural language features, since the simpler spell check, degrades when it work with languages different from English.

[+] zodiac|10 years ago|reply

> It is not uncommon for moderate length sentences - say 20 or 30 words in length - to have hundreds, thousands, or even tens of thousands of possible syntactic structures.

Does "possible" mean "syntactically valid" here? If so I'd be interested in a citation for it.

Also, I wonder what kind of errors it makes wrt to the classification in http://nlp.cs.berkeley.edu/pubs/Kummerfeld-Hall-Curran-Klein...

[+] joosters|10 years ago|reply

I don't see how a linguistic parser can cope with all the ambiguities in human speech or writing. It's more than a problem of semantics, you also have to know things about the world in which we live in order to make sense of which syntactic structure is correct.

e.g. take a sentence like "The cat sat on the rug. It meowed." Did the cat meow, or did the rug meow? You can't determine that by semantics, you have to know that cats meow and rugs don't. So to parse language well, you need to know an awful lot about the real world. Simply training your parser on lots of text and throwing neural nets at the code isn't going to fix this problem.

[+] preserves|10 years ago|reply

This is exactly the type of problem that a good parser should be able to solve, and training a parser on lots of data and throwing neural nets may indeed be a viable solution. Why wouldn't it be? The article describes how their architecture can help make sense of ambiguity.

In terms of a basic probabilistic model, P(meow | rug) would be far lower than P(meow | cat), and that alone would be enough to influence the parser to make the correct decision. Now, if the sentence were "The cat sat on the rug. It was furry", that would be more ambiguous, just like it is for an actual human to decode. But models trained on real-world data do learn about the world.

235 comments