top | item 19281375

(no title)

paperwork | 7 years ago

I’ve seen this mentioned before, including a blog post by the fast.ai folks. Any idea where I can get details? If my tabular data set is small, what kind of embedding can I get out of it? Or is the idea that a larger data set is used for embeddings of categorical data?

discuss

yorwba|7 years ago

Pre-trained embeddings are only helpful if they are trained on a different (ideally larger) dataset or even a different task, but with the same kind of input data. So you would need to find out where else something similar to the data in your tables appears. If some of the data is text, word embeddings may be applicable. Or if you're trying to analyze user activity by time and location, you might try to transfer what can be learned about the influence of holidays from publicly observable activity e.g. on Twitter (just a random idea that popped into my head, no guarantee that it can actually work).

Of course if all you have are numbers without context, there isn't a lot you can do to improve the situation.

sdenton4|7 years ago

I think this is mainly a thing for perception (images and sounds). Tabular data would have to match up with the training dataset, and "most" interesting tabular models are the sports of things guarded like piles of gold by the businesses that build them...