(no title)
mlucy | 7 years ago
Apologies for the super long response, but you had a lot of points.
> Am I really missing something here or this thing is a complete nonsense with no actual use cases what's so ever in practice?
Hopefully you're missing something, or we've been wasting a great deal of our time ;)
> There are a number of off-the-shelf models that would give you image/sentence embedding easily. Anyone with sufficient understanding of embedding/word2vec would have no trouble train an embedding that is catered to the specific application, with much better quality.
For images and text, it's definitely true that you can train your own embeddings with an off-the-shelf model. But I think it's more likely that we end up in a place where a small number of people train a bunch of really good models and everyone else uses them.
I think this for three reasons:
1. It's what we've seen with word2vec. The vast majority of people that use word2vec aren't training it themselves, they're downloading pretrained weights.
2. Most people don't have enough data to train a good embedding themselves. There are good public datasets for images and text, but we're planning to produce embeddings for more niche verticals too.
Keep in mind that modern deep neural nets are very data hungry, and the problem gets worse every year. In a few years I think we're going to be in a spot where getting state of the art performance requires a lot of compute, and more data than most people have access to.
3. Prebuilt embeddings drastically speed up development. If you have a traditional model, and you think feeding some images into it might improve it, you can test that hypothesis in twenty minutes with Basilica. We've talked to a lot of teams that have high-dimensional data lying around which they think might improve their models, but they aren't sure, and they can't really justify a week or two of someone's time to explore it.
> For NLP applications, the corpus quality dictates the quality of embedding if you use simple W2V. Word2Vec trained on Google News corpus is not gonna be useful for chatbot, for instance. Different models also give different quality of embedding. As an example, if you use Google BERT (bi-directional LSTM) then you would get world-class performance in many NLP applications. > > The embedding is so model/application specific that I don't see how could a generic embedding would be useful in serious applications. Training a model these days is so easy to do. Calling TensorFlow API is probably easier then calling Basilica API 99% of the case.
It's definitely true that you usually want your input distribution to be reasonably close to the distribution the embedding was trained on. (Although it's worth noting that having a different distribution for your embedding acts as a form of regularization, and sometimes that matters more than the problems you get from the distributional shift.)
I think you're overstating the case though. An embedding trained on a wide variety of sources will perform really well on a lot of tasks, and often other things like amount of data you trained on matters more than distributional similarity.
You may find https://research.fb.com/wp-content/uploads/2018/05/exploring... interesting, especially the end of section 3.1.2. The paper trains a giant network on billions of Instagram images, and then explores both fine-tuning it on Imagenet and using the features of the last layer as inputs to a logistic regression (which they call "feature transfer" rather than "embedding").
The logistic regression trained on the Instagram features gets 83.6% top-1 accuracy, compared to 85.4% for full network fine-tuning and 80.9% for a ResNeXt model trained directly on ImageNet.
In other words, the effect of the larger training set dominated the distributional shift.
No comments yet.