Huh. I talked to some experts and they told me NN loss functions are bowl-shaped and have single minima, but those minima take a very long time to navigate to in high dimensional spaces.
For higher feature counts the real concern is saddle points rather than minima, where the gradient is so small that you barely move at all each iteration and get "stuck".
dekhn|3 years ago
Salgat|3 years ago