top | item 29480212

(no title)

shoyer | 4 years ago

> Neural networks need completely different optimisation methods, and there is no practically useful application of any of the Newton or Quasi-Newton methods for their optimisation.

I don't think this is quite fair. There are several variations of 2nd order methods, notably KFAC and Shampoo, that seem to quite effective for large-scale neural network training, e.g., see the intro of this paper for an overview: https://openreview.net/forum?id=-t9LPHRYKmi

discuss

order

WithinReason|4 years ago

I checked the paper. The 2nd order method actually achieves 10 times worse minimum (it's much worse at minimisation), and the reason the results are better is because the network overfits less (Figure 3). The reviewers should have caught this!

WithinReason|4 years ago

One of the authors is Donald Goldfarb, who is the G in BFGS, so maybe they are onto something. But I'm always suspicious if the tests shown in a paper are fair.