AllenNLP – An open-source NLP research library, built on PyTorch

[+] TekMol|8 years ago|reply

Wow, is this really state of the art?

    Joe did not buy a car today.
    He was in buying mood.
    But all cars were too expensive.

    Why didn't Joe buy a car?

    Answer: buying mood

I think I have seen similar systems for decades now. I thought we would be further along meanwhile.

I have tried for 10 or 20 minutes now. But I can't find any evidence that it has much sense of syntax:

    Paul gives a coin to Joe.

    Who received a coin?

    Answer: Paul

All it seems to do is to extract candidates for "who", "what", "where" etc. So it seems to figure out correctly that "Paul" is a potential answer for "Who".

No matter how I rephrase the "Who" question, I always get "Paul" as the answer. "Who? Paul!", "Who is a martian? Paul!", "Who won the summer olympics? Paul", "Who got a coin from the other guy? Paul!"

Same for "what" questions:

    Gold can not be carried in a bag. Silver can.

    What can be carried in a bag?

    Answer: Gold

[+] galenko|8 years ago|reply

Sadly, the NLP world is full of hot air. I've seen so many companies get funding for complete "written by a 12-year old" dogshit "industry leading IP", it's not even funny anymore.

The hype has gone down and some are actually doing great work, but 90% of the people who say they do NLP/AI stuff don't even fundamentally understand what NLP/AI is.

[+] glup|8 years ago|reply

All of the above require fairly complex world knowledge as well as an explicit representation of a scene. There is minimal leverage for lexical distributional statistics in these cases—arguably the one thing we have had major success in using (e.g. building vector space word representations, like Word2Vec; finding the highest probability parse tree for an utterance).

[+] senatorobama|8 years ago|reply

A key tenet of supervised learning is that you will only ever do as well as what's in your training set.

[+] halflings|8 years ago|reply

The difference with new NN-based systems is that they are trained end-to-end, learn the syntax and some form of "reasoning". Check Memory Networks, by facebook, for example (two NNs, one for "reasoning" and one for storing long-term data, quite impressive).

Now, it's still an area of active research... and I'm not sure what "state-of-the-art" means for this library, somebody said that they rank #27th in some commonly used dataset.

[+] msamwald|8 years ago|reply

According to the website they use the BiDAF model, which as a single model does not produce state-of-the-art results on the SQuAD benchmark. It is ranked 27th here: https://rajpurkar.github.io/SQuAD-explorer/

[+] rubyfan|8 years ago|reply

I can’t imagine that the human mind works even remotely how these NPL systems work. Grammars, tokenizing, matrices... there must be a better approach

[+] make3|8 years ago|reply

Try the more complicated text sources. It is able to parse them and still answer questions reasonably well

[+] mamp|8 years ago|reply

This is very brittle: it works really well on the pre-canned examples but the vocabulary seems very tightly linked. It doesn't handle something as simple as:

'the patient had no pain but did have nausea'

Doesn't yield any helpful on semantic role labeling and didn't even parse on machine comprehension. If I vary it to say ask 'did the patient have pain?' the answer is 'nausea'.

CoreNLP provides much more useful analysis of the phrase structure and dependencies.

[+] sanxiyn|8 years ago|reply

In "Adversarial Examples for Evaluating Reading Comprehension Systems" https://arxiv.org/abs/1707.07328, it was found that adding a single distracting sentence can lower F1 score of BiDAF (which is used in demo here) from 75.5% to 34.3% on SQuAD. In comparison, human performance goes from 92.6% to 89.2%.

[+] andrew3726|8 years ago|reply

There's a blog post (the morning paper) about this: https://blog.acolyer.org/2017/09/13/adversarial-examples-for...

[+] vbuwivbiu|8 years ago|reply

"the squid was walked by the woman"

"what is the fifth word in that sentence ?"

Answer: squid

[+] strin|8 years ago|reply

We need more demos of AI models: there is what people claim their model does, and there is what the model actually does.

[+] wyldfire|8 years ago|reply

How does this compare with spacy?

[+] glup|8 years ago|reply

Different set of tasks. SpaCy is focused on bread-and-butter tasks like tokenization, part of speech tagging, and dependency parsing (not to say that these are easy, but that they are things people have been working on a long time). AllenNLP seems focused on distributing relatively recent neural models (last few years) of more complex language understanding like labeling semantic roles (agents, patients, etc.) and identifying textual entailments (=mining facts from a sentence). It is not great at these tasks, because this is v. difficult and a very active area of ongoing research.

25 comments