top | item 42826512

(no title)

What is it that makes higher order derivatives less useful at high dimensionality? Is it related to the Curse of Dimensionality, or maybe something like exploding gradients at higher orders?

discuss

mike-the-mikado|1 year ago

In n dimensions, the first derivative is an n-element vector. The second derivative is an n x n (symmetric) matrix. As n grows, the computation required to estimate the matrix increases (as at least n^2) and computation needed to use it increases (possibly faster).

In practice, clever optimisation algorithms that use the 2nd derivative won't actually form this matrix.