(no title)
roadside_picnic | 1 month ago
I don't know anyone who would disagree with that statement, and this is the standard framing I've encountered in nearly all neural network literature and courses. If you read any of the classic gradient based papers they fundamentally assume this position. Just take a quick read of "A Theoretical Framework for Back-Propagation (LeCun, 1988)" [0], here's a quote from the abstract:
> We present a mathematical framework for studying back-propagation based on the Lagrangian formalism. In this framework, inspired by optimal control theory, back-propagation is formulated as an optimization problem with nonlinear constraints.
There's no way you can read that a not recognize that you're reading a paper on numerical methods for function approximation.
The issue is that Vaswani, et al never mentions this relationship.
kelipso|1 month ago