top | item 39802152

(no title)

This is a good question that doesn't have a short answer. There are some different philosophical opinions about this.

One way to look at this is to note that even with forward mode autodiff, there have generally historically been two different viewpoints for the whole thing, which I'll call the "computer science" view and the "algebraic" view.

The computer science view involves things that look computational graphs with nodes that look like {"value": 123, "deriv": 456}. We are storing real values along with propagated partial derivatives. We have a custom * operator which sets a * b = {"value": a.value * b.value, "deriv": a.value * b.deriv + b.value * a.deriv}. Other functions, like exp, sin, cos, log, etc are also extended to handle these kinds of input. There's a ton of literature that views things using this kind of framework going back to the 1960s.

The algebraic view uses dual numbers. Instead of writing {"value": 123, "deriv": 456}, we write 123 + 456h. We get the same results as the above: (a + bh) * (c + dh) = ac + (ad + bc)h. We can extend many functions in a natural way to the dual plane, such as exp, sin, cos, log, and get values there. There's also plenty of literature on these, going back to the late 1800s.

A modern view is to note that these two things are *identical*. It isn't that forward mode autodiff "uses" dual numbers; it is dual numbers. The set of elements of the form {"value": x, "deriv": y}, with addition and multiplication as stated, satisfies the axioms of a real algebra and is isomorphic to the dual numbers. We could have written {"real_part": x, "dual_part": y} if we wanted. You can see this viewpoint in some of the links I've posted elsewhere here.^[1]

So given all of that background, there are two answers to your original question. The first is to just simply view it as that "reverse-mode autodiff doesn't use dual numbers." Many people have this view, and I would say that that it really focuses on what I've called the "computer science" view above.

The second view is to note that the relationship between reverse-mode autodiff and dual numbers is the same as the relationship between reverse-mode autodiff and forward-mode autodiff. It would be silly to say that they are totally different, unrelated things: at the end of the day all we are really doing is changing the order in which we perform a bunch of multiplies of Jacobian matrices. I tend to view it as similar to the relationship between the DFT and the FFT: there is this super elegant linear algebra view involving DFT matrices. Do we say that the FFT "doesn't use matrices?" Well, I guess, but are we going to go so far as to say that it also doesn't involve linear algebra, etc? That is my view.

There are a few other differences between reverse-mode and forward-mode autodiff. Each individual operation in the computation graph, for instance, can be thought of as an individual instance of forward-mode autodiff. In reverse-mode, on the other hand, we typically store the (back-)propagated adjoints as additional data on the input node objects, not the output nodes. This is useful if we are thinking of backpropagation on a graph. It's up to you if you view these as involving materially different theories or just differences of implementation for the sake of optimization.

In short, the main thing is that there's less literature on a purely algebraic version of reverse-mode autodiff in general.

[1] There is one important difference: we often think of the dual numbers as an ordered ring with "h" infinitesimal. The first viewpoint doesn't use this part of the mathematical structure - and it's very interesting to note that it isn't even necessary! Although it's cute to think of h as infinitesimal, the autodiff properties we get flow purely from the algebraic properties of having h^2 = 0, regardless of order. Of course, though, we can always just think of the dual numbers as an unordered ring, if we want.

discuss

No comments yet.