I hadn't read it before! That's a fascinating result, actually. They emphasize interpretability in the paper, but I find it more interesting that you can do so well with only local information.
My first thought is that it makes sense that averaging together a bunch of local predictions would work well on the ImageNet task, since the different classes tend to have obviously different local textures, and class-relevant information makes up a large part of the image. I would be very curious to see if the technique is as competitive for other tasks.
I come from Deep reinforcement learning. When considering simulated environments (such as AlphaZero, AlphaStar), can
feature engineering dramatically improve the cpu-requirement or sample-efficiency ?
Or are low-level features the "easiest" part for the network to learn?
Edit1 : I understand of course the academic purity of working from raw data.
Edit2: so simulated means lots of samples, on policy learning, but also very cpu intensive.
I think if you have a small to medium sized dataset of images or text, deep feature extraction would be the first thing I'd try.
I'm not sure what the most interesting problems with that property are. Maybe making specialized classifiers for people based on personal labeling? I've always wanted e.g. a twitter filter that excludes specifically the tweets that I don't want to read from my stream.
IMHO (deep) feature engineering is important in these cases:
o the lower the level of representation the more important it is to increase the level of abstraction by learning or defining manually new features
o in the presence of (fine-grained) raster (automated) feature engineering is especially important. Therefore, feature engineering is important in audio analysis (1d raster) and video analysis (2d raster).
I don't work with time series data much myself. I would imagine you can get at least some transfer learning, since there are patterns that show up across different domains. It looks like there's been a little bit of work done on this: https://arxiv.org/pdf/1811.01533.pdf .
According to them, transfer learning can improve a time series model if you pick the right dataset to transfer from, but they don't seem to be getting the same unbelievably strong transfer results that you'd see on images and text.
Considering the rate of change in this field, what would be beneficial to learn for people who don't actually get to use machine learning in their day to day job? I'd love to dive in and learn more about machine learning but I don't want to waste time learning something that will be totally irrelevant in a couple years.
skybrian|7 years ago
https://openreview.net/forum?id=SkfMWhAqYQ
mlucy|7 years ago
My first thought is that it makes sense that averaging together a bunch of local predictions would work well on the ImageNet task, since the different classes tend to have obviously different local textures, and class-relevant information makes up a large part of the image. I would be very curious to see if the technique is as competitive for other tasks.
yazr|7 years ago
Or are low-level features the "easiest" part for the network to learn?
Edit1 : I understand of course the academic purity of working from raw data.
Edit2: so simulated means lots of samples, on policy learning, but also very cpu intensive.
fouc|7 years ago
mlucy|7 years ago
I'm not sure what the most interesting problems with that property are. Maybe making specialized classifiers for people based on personal labeling? I've always wanted e.g. a twitter filter that excludes specifically the tweets that I don't want to read from my stream.
asavinov|7 years ago
o the lower the level of representation the more important it is to increase the level of abstraction by learning or defining manually new features
o in the presence of (fine-grained) raster (automated) feature engineering is especially important. Therefore, feature engineering is important in audio analysis (1d raster) and video analysis (2d raster).
julius_set|7 years ago
mlucy|7 years ago
According to them, transfer learning can improve a time series model if you pick the right dataset to transfer from, but they don't seem to be getting the same unbelievably strong transfer results that you'd see on images and text.
jewelthief91|7 years ago