(no title)
Derander | 11 years ago
The analytical gradient is actually a general tool used in multivariable calculus. It's vector-valued, in that if you take the gradient of a function of three variables: f(x, y, z), then you'll get a 3-vector back. Vectors are defined by two characteristics: a direction and a magnitude. The gradient's direction is the direction of greatest increase and the magnitude of the gradient is the instantaneous rate of change.
The gradient is being put to work here in order to optimize a function using a process called gradient ascent. Intuitively it makes sense that in order to optimize your function you'd want to "move" in the direction of greatest increase at every step. However, the size of the step that you take is tricky. As you point out, in this case, we can increase the objective function value more if we double the step size. However, you're not actually doubling /the gradient/.
If you look at the expression that you wrote you should see:
x + 2 * dx * step, y * 2 * dy * step.
What you've done is double the step-multiplier, not the gradient itself (<dx, dy>). This means that the optimization process jumps further at each step. However, the step size that OP chose is somewhat arbitrary to begin with, so it's not clear why any particular choice would be better or worse. The reason that the step size matters is that if your step size is too large your optimization process might jump over a minimum location or something similarly bad -- there are pathological functions like the Rosenbrock Function [1] which are notoriously hard to optimize with gradient ascent. In practice, you'll often choose your step size more intelligently based on a few different tests, or vary it as the optimization process progresses.In this particular instance, the surface that you're trying to optimize is pretty simple so basically any step value will do the trick. It may take a different number of steps in order to compute the global maximum, but most reasonable step sizes will get there.
No comments yet.