I understand that you do mention the pre-training / transfer learning approach clearly, but isn't it disingenuous to claim that you provide better performance based on (only) 100 labeled examples, when the pre-training dataset (Wikitext-103) actually contains 103M words?
jph00|7 years ago
It is totally correct and in no way misleading to say we need only 100 labeled examples. Anyone can get similar results on their own datasets without even needing to train their own wikitext model, since we've made the pre-trained model available.
(BTW, I see you work at a company that sells something that claims to "categorize SKUs to a standard taxonomy using neural networks." This seems like something you maybe could have mentioned.)
ramanan|7 years ago
Also, I don't understand the need to be so defensive though and the relevance between my employer and my post?