In general, it allows a better approximation of the solution function for far less hidden neurons. Sure, you could get arbitrarily close using a single hidden layer, but that hidden layer might need to be unfathomably large. Same idea for network topology in multilayer nets - a network could eventually learn to set a lot of the weights to zero, but training is a lot faster and more effective if you know a good problem-specific topology to start with. Deep nets make problems more tractable. Recurrence is the real game-changer, since then you've moved from non-linear function approximators up to Turing completeness (at least over the set of all possible RNNs).
No comments yet.