top | item 39797181

(no title)

Sure, right here: https://github.com/pytorch/pytorch/blob/main/torch/autograd/...

Here's the documentation: https://pytorch.org/tutorials/intermediate/forward_ad_usage....

> When an input, which we call “primal”, is associated with a “direction” tensor, which we call “tangent”, the resultant new tensor object is called a “dual tensor” for its connection to dual numbers[0].

discuss

barfbagginus|1 year ago

This could help settle the objection that torch doesn't implement dual number based Forward Accumulation.

But I'm wondering if it does it by implementing dual tensors and automatically 'lifting' ordinary tensor computations into dual tensor computations? That would be a little surprising to me.

The more common approach I have seen is that we decorate existing operations with additional logic to accumulate and pass on a derivative value as well as the actual value during evaluation. This can be important for instance for transcendental functions, which might be computed with methods like lookup tables and approximate series, which do not necessarily lend themselves to accurate dual number computations, but do have a straightforward formulas for the derivative. It can also be a requirement when our transcendentals are computed in the FPU, which does not expose any power series to automatically thread dual our numbers through.

It would make sense in the case of something like pytorch if this were the case, since it could be a bit of a stretch to expect the correct numbers to appear if only we just compute everything with dual numbers. Indeed, the original torch functions certainly exploit the FPU, so we very likely have to explicitly formulate a derivative in at least some cases.

I wonder if this observation could help heal the rift between the two positions here - it seems like your counterpart could be satisfied with the view that most forward mode AD it's not quite as "pat" as just injecting a dual numbers library into existing code, but requires careful extension to accurately accumulate the derivatives of each operation in the system.

I believe that reaching common ground around that fact could help your counterpart reach a satisfying conclusion here. The methods are clearly dual number in spirit, but may require more subtle implementation details then the traditional dual number story, which states "dual numbers get you free derivatives with no need to extend your functions". Pointing out how this does break down for general functions is not only true, but could serve as an olive branch and opportunity to advance this discussion.

Besides that remark, which I made in the intent of resolving a conflict and potentially fostering communication, I wanted to thank you for this amazing link that you shared. I did not know that pytorch had forward mode AD! I may just have to dig into it and see how they pull it off!