(no title)
jkldotio | 9 years ago
You could then use classical text indexing on the text, perhaps with a topic model like LDA. Then an image with a plane in it will be indexed by "plane" via the output of the neural network but would also come first, or in the top results, when using "flight" as the query via a topic model.
Ditto for word2vec or para2vec over those words, the benefit being you can bring the knowledge of relations contained in the textual training data, Wikipedia or something else, to bear on the problem. I.e. a golf club and a baseball glove might not be correlated in the neural network that annotated the images but might be correlated in the text based knowledge model trained on Wikipedia and so a query of "sport" might bring both images up.
The big players like Google already have caption generation that's capturing relationships between objects.[1]
[0]http://scikit-image.org/docs/dev/auto_examples/features_dete... [1]https://arxiv.org/pdf/1411.4555.pdf
No comments yet.