top | item 40293661

(no title)

Is the claim that there aren’t local many local minima for high dimensional problems eg in neural network loss functions?

discuss

Imnimo|1 year ago

Yes. To be more specific, it's that nearly all points where the derivative is zero are saddle points rather than minima. Note that some portion of this nice behavior seems to be due to design choices in modern architectures, like residual connections, rather than being a general fact about all high dimensional problems.

https://arxiv.org/pdf/1712.09913 This paper has some nice visualizations.