top | item 15822931

(no title)

zgao | 8 years ago

Although probably not sufficient for AGI, network architecture is essentially guaranteed to be important, because of both ample empirical evidence of the importance of architectures and ample reason, from facts about numerics, to believe that it is important.

In the first category (empirical evidence),

- The discrete leap from non-LSTM RNN to LSTM network performance on NLP was essentially due to a "better factoring of the problem": breaking out the primitive operations that equate to an RNN having "memory" had a substantial effect on how well it "remembered."

- The leap in NMT from LSTM seq2seq to attention-based methods (the Transformer by Google) is another example. Long-distance correlations made yet another leap because they are simply modeled more directly by the architecture than in the LSTM.

- The relation network by DeepMind is another excellent example of a drop-in, "pure" architectural intuition-motivated replacement that increased accuracy from the 66% range to the 90% range on various tasks. Again, this was through directly modeling and weight-tying relation vectors through the architecture of the network.

- The capsule network for image recognition is yet another example. By shifting the focus of the architecture from arbitrarily guaranteeing only positional invariance to guaranteeing other sorts, the network was able to do much better at overlapping MNIST. Again, a better factoring of the problem.

These developments all illustrate that picking the architecture and the numerical guarantees baked into the "factoring" of the architecture (for example, weight tying, orthogonality, invariance, etc.) can have and has had a profound effect on performance. There is no reason to believe this trend won't continue.

In fact, there are some very interesting ways to think about the principles behind network structure -- I can't say for sure that it has any predictive power yet, but types are one intuitively appealing way to look at it: http://colah.github.io/posts/2015-09-NN-Types-FP/

discuss

order

cs702|8 years ago

Thanks. I agree. The anecdotal evidence suggests that architecture is indeed important.

This paper is the only direct evidence I've seen of it, though.

Great work. Compelling.