(no title)
arnioxux | 7 years ago
That's hilarious!
But I found the criticism on their toy task less convincing. Algorithmic toy tasks can always be solved "without any training whatsoever".
For example in RNNs, there's a toy task that adds two numbers that are far apart in a long sequence. This can be solved deterministically with a one liner, but that's not the point. It's still useful for demonstrating RNN's failure with long sequences. Would you then call the subsequent development to make RNNs work for long sequences just feature engineering with no universality?
In that sense, I think their choice of toy task is fine. They're just pointing out position is a feature that's currently overlooked in the many architectures that are heavily position dependent (they showed much better results on faster r-cnn for example).
fwilliams|7 years ago
This work shows that the network architecture is a strong enough prior to effectively learn this set of tasks from a single image. Note that there is no pretraining here whatsoever.
More to your point, I think a big problem with toy tasks are not so much the tasks but the datasets. A lot of datasets (particularly in my field of geometry processing) have a tremendous amount of bias towards certain features.
A lot of papers will show their results trained and evaluated on some toy dataset. Maybe their claim is that using such-and-such a feature as input improves test performance on such-and-such problem and dataset.
The problem with these papers often comes when you try to generalize to data that is similar but not from the toy dataset. A lot of applied ML papers fail to even moderately generalize, and the authors almost never test or report this failure. As a result, I think we can spend a lot of time designing over-fitted solutions to certain problems and datasets.
On the flipside, there are plenty of good papers which do careful analysis of their methods' ability to generalize and solve a problem, but when digging through the literature its important to be judicious. I've wasted time testing methods that turn out to work very poorly.