Hey @moultano in response to your argument about walls and Nets not being in a minima, its my understanding nets always live on high dimensional saddle points and that's commonly referred to in literature. Even when you're optimizing you're just moving towards ever lower cost saddles that are closer to the optimum but almost never a local optimum (for the reasons spelled out in your post).
moultano|5 years ago
acadien|5 years ago
Another way to conceptualize these is to think of being at the minima of a parabola in 2 dimensions, but then seeing you're not in a minima in a 3rd dimension. Any time you're in a minima in at least 1 dimension, you're on a saddle.
You can extend this concept to a neural net which lives in millions of dimensions, undergoing SGD. When beginning an optimization run SGD moves in some direction to minimize the a bundled cost, inevitably stumbling into minima in (usually) many dimensions. Subsequent iterations will shift some dimensions out of minima and other dimensions into minima, the net is always living on a saddle during this process.
There are many papers that discuss the process in these terms and others that implicitly use it. I wouldn't say its a "hot area of research" but more of a tool for thinking about these processes and sometimes gaining some insight in to why things get stuck during training.
muppet_frog|5 years ago