(no title)
aab0 | 9 years ago
"Tuning hyperparameters of learning algorithms is hard because gradients are usually unavailable. We compute exact gradients of cross-validation performance with respect to all hyperparameters by chaining derivatives backwards through the entire training procedure. These gradients allow us to optimize thousands of hyperparameters, including step-size and momentum schedules, weight initialization distributions, richly parameterized regularization schemes, and neural network architectures. We compute hyperparameter gradients by exactly reversing the dynamics of stochastic gradient descent with momentum."
But it's not feasible yet. Once it is, you can imagine collapsing the whole neural net zoo: you merely specify the input/output type/dimension and then it starts gradient-ascent over all the possible models as tweaked by internal hyperparameters.
No comments yet.