top | item 12757449

(no title)

aab0 | 9 years ago

We have to hardwire architectures because we can't learn architectures yet, or to put it another way, backpropagation doesn't yet work on hyperparameters as well as on parameters. Hyperparameters should be learnable as in theory there's nothing special about them (it's models all the way down!) - a hyperparameter is merely a parameter we don't yet know how to learn - and this has been demonstrated: http://jmlr.org/proceedings/papers/v37/maclaurin15.pdf "Gradient-based Hyperparameter Optimization through Reversible Learning"

"Tuning hyperparameters of learning algorithms is hard because gradients are usually unavailable. We compute exact gradients of cross-validation performance with respect to all hyperparameters by chaining derivatives backwards through the entire training procedure. These gradients allow us to optimize thousands of hyperparameters, including step-size and momentum schedules, weight initialization distributions, richly parameterized regularization schemes, and neural network architectures. We compute hyperparameter gradients by exactly reversing the dynamics of stochastic gradient descent with momentum."

But it's not feasible yet. Once it is, you can imagine collapsing the whole neural net zoo: you merely specify the input/output type/dimension and then it starts gradient-ascent over all the possible models as tweaked by internal hyperparameters.

discuss

No comments yet.