top | item 31428967

(no title)

This is the case most contemporary neural networks as well. It turns out for many domains, a "good" local minima generalizes well across many tasks.

discuss

dekhn|3 years ago

Huh. I talked to some experts and they told me NN loss functions are bowl-shaped and have single minima, but those minima take a very long time to navigate to in high dimensional spaces.

Salgat|3 years ago

For higher feature counts the real concern is saddle points rather than minima, where the gradient is so small that you barely move at all each iteration and get "stuck".