I want to write a program that can classify the input. For example, if the input is Japan, it should output country. For reddit --> website. Ferrari --> car. For certain things it is as easy as looking at the first sentence of Wikipedia for that word. But I want to get more specific such as Ferrari --> car --> sports car, or reddit --> website --> website with user generated content. I am sure people have done this in NLP, a pointer to where I should start would be very much appreciated.
yannyu|12 years ago
You would take a corpus of documents and manually "tag" them with the entities you're looking for and designate that as your training corpus. You would then run that through a machine learning algorithm (such as https://opennlp.apache.org/), and then use the resulting model to process text and identify the entities it was trained on.
hakann|12 years ago
msantos|12 years ago
http://en.wikipedia.org/wiki/DBpedia
http://en.wikipedia.org/wiki/Freebase
zharkov|12 years ago
sp332|12 years ago
hakann|12 years ago