wsmoses | 5 years ago | on: Swift for TensorFlow Shuts Down
wsmoses's comments
wsmoses | 5 years ago | on: Enzyme – High-performance automatic differentiation of LLVM
I was (perhaps poorly) trying to explain how while yes AD (regardless of implementation in Enzyme, PyTorch, etc) _can_ avoid caching values using clever tricks, they can't always get away with it. The cache-reduction optimizations really depend on the abstraction level chosen for what tools can do. If a tool can only represent the binary choice of whether an input is needed or not, it could miss out on the fact that perhaps only the first element (and not the whole array/tensor) is needed.
Regarding Enzyme v JaX/etc, again I think that's the wrong way to think about these tools. They solve problems at different levels and in fact can be used together for mutual benefit.
For example a high-level AD tool in a particular DSL might know that algebraically you don't need to compute the derivative of something since from the domain knowledge it is always a constant. Without that domain knowledge, a tool will have to actually compute it. On the other side of the coin, there's no way such a high level AD tool would do all the drudgery of invariant code motion, or even lower level scheduling/register allocation (and see Enzyme paper for reasons why these can be really useful optimizations for AD).
In an ideal world you want to combine all this together and have AD done in part whenever there's some amount of meaningful optimization (and ideally remove abstraction barriers like say a black box call to cudnn). We demonstrate this high and low level AD in a minimal test case against Zygote [high level Julia AD], replacing a scalar code which is something Zygote is particularly bad at. This thus enables both the high level algebraic transformations of Zygote and the low level scalar performance of Enzyme, which is what you'd really want to do.
It looks like the discussion of this has dropped off for now, but I'm sure shoyer would be able to do a much better job of listing interesting high-level tricks JaX does [and perhaps low level ones it misses] as a consequence of its choice of where to live on the abstraction spectrum.
Also thanks for reminding me about matrix decomposition, I actually think there's a decent chance of doing that somewhat nicely at a low level from various loop analyses, but I got distracted by a large fortran code for nuclear particles.
wsmoses | 5 years ago | on: Enzyme – High-performance automatic differentiation of LLVM
You can explicitly define custom gradients by attaching metadata to the function you want to have the custom gradient (and Enzyme will use that even if it could differentiate the original function).
Integral types: mayyybe, depending what exactly you mean. I can imagine using custom gradient definitions to try specifying how an integral type can be used in a differentiable way (say representing a fixed point). We don't support differentiating integral types by approximating them as continuous values if that's what you're asking. There's no reason why we couldn't add this (besides perhaps bit tricks being annoying to differentiate), but haven't come across a use case.
wsmoses | 5 years ago | on: Enzyme – High-performance automatic differentiation of LLVM
* IR of active functions must be accessible when Enzyme is called (e.g. cannot differentiate dlopen'd functions)
* Enzyme must be able to deduce the types of operations being performed (see paper section on interprocedural type analysis for details why)
* Support for exceptions is limited (and running with -fno-exceptions, equivalent in a diff language, or LLVM's exception lowering pass removes these).
* Support for parallel code (CPU/GPU) is ongoing [and see the prior comment on GPU parallelism for details]
wsmoses | 5 years ago | on: Enzyme – High-performance automatic differentiation of LLVM
wsmoses | 5 years ago | on: Enzyme – High-performance automatic differentiation of LLVM
wsmoses | 5 years ago | on: Enzyme – High-performance automatic differentiation of LLVM
wsmoses | 5 years ago | on: Enzyme – High-performance automatic differentiation of LLVM
This is also useful in the scientific world where derivatives of functions are commonplace.
You could also use it in more performance-engineering/computer systems ways as well by using the derivatives to perform uncertainty quantification and perhaps decide to use 32-bit floats rather than 64-bit doubles.
wsmoses | 5 years ago | on: Enzyme – High-performance automatic differentiation of LLVM
One advantage, however, of doing a more whole-program approach to AD rather than individual operators is that one might be able to avoid caching values unnecessarily. For example if an input isn't modified (and still exists) by the time the value is needed in the reverse pass, you don't need to cache it but can simply use the original input without a copy.
And yes PyTorch/TF tend to perform a (limited) form of AD as well, rather than do numerical differentiation (though I do think there may be an option for numerical?)
I wouldn't really position a tool like Enzyme as a competitor to PyTorch/TF (they may have some better domain-specific knowledge after all), but rather a really nice complement. Enzyme can take derivatives of arbitrary functions, in any LLVM-based language rather than the DSL of operators supported by PyTorch/TF. In fact, we built a plugin for PyTorch/TF that uses Enzyme to import custom foreign code as a differentiable layer!
wsmoses | 5 years ago | on: Enzyme – High-performance automatic differentiation of LLVM
If all of the code you care about is in one compilation unit, you're immediately good to go.
Multiple compilation units can be handled in a couple of ways, depending on how much energy you want to set it up (and we're working on making this easier).
The easiest way is to compile with Link-Time Optimization (LTO) and have Enzyme run during LTO, which ensures it has access to bitcode for all potentially differentiated functions.
The slightly more difficult approach is to have Enzyme ahead-of-time rather than lazily emit derivatives for any functions you may call in an active way (and incidentally this is where Enzyme's rather aggressive activity analysis is super useful). Leveraging Enzyme's support for custom derivatives in which an LLVM function declaration can have metadata that marks its derivative function, Enzyme can then be told to use the "custom" derivatives it generated while compiling other compilation units. This obviously requires more setup so I'm usually lazy and use LTO, but this can definitely be made easier as a workflow.
wsmoses | 5 years ago | on: Enzyme – High-performance automatic differentiation of LLVM
wsmoses | 5 years ago | on: Enzyme – High-performance automatic differentiation of LLVM
Beyond cache reduction, in our paper we demonstrate a lot of interesting ways that combining AD with a compiler can create potential speed-up. For example, we are often able to dead-code eliminate part of the original forward-pass code since it's not needed to compute the gradient.
We also have a cool example in the paper showing an asymptotic [O(N^2) => O(N)] speedup on a code for normalizing a vector because doing AD in the compiler lets Enzyme run after optimization (and in that example benefit from loop invariant code motion in a way that tools that aren't in the compiler cannot do).
wsmoses | 5 years ago | on: Enzyme – High-performance automatic differentiation of LLVM
We're currently working with the Rust ML infrastructure group to make a nice integration of Enzyme into Rust (e.g. nice type-checking, safety, etc). If you're interested in helping, you should join the Rust ML meetings and/or Enzyme weekly meetings and check out https://github.com/rust-ml/Meetings and https://github.com/tiberiusferreira/oxide-enzyme/tree/c-api . There's a bunch of interesting optimizations and nicer UX for interoperability we want to add so more manpower is really helpful!
The most interesting thing from the Rust standpoint is that ideally we'd want Enzyme to be loaded into the Rust compiler as a plugin (much like it is for Julia, Clang for C/C++, etc) -- but Rust doesn't support the option for that yet. This means we can either help push for plugins/custom codegen in Rust, make script-based compilation tools within rustc [I don't remember the specific name but someone who is more of a Rust expert I'm sure can chime in], or do the sketchy LTO approach above [not always desirable as it requires running LTO].
Alternatively Enzyme can just become part of LLVM mainline so everyone can use it without a plugin :P We're not quite there yet but we're in the process of becoming a formal LLVM incubator project!
wsmoses | 5 years ago | on: Enzyme – High-performance automatic differentiation of LLVM
You can use existing tools within LLVM to automatically generate GPU code out of existing code, and this works perfectly fine, even running Enzyme first to synthesize the derivative.
You can also consider taking an existing GPU kernel and then automatically differentiating it. We currently support a limited set of cases for this (certain CUDA instructions, shared memory etc), and are working on expanding as well as doing performance improvements. AD of existing general GPU kernels is interesting [and more challenging] since racey reads in your original code become racey writes in the gradient -- which must have extra care taken to make sure they don't conflict. To my knowledge GPU AD on general programs (e.g. not a specific code) really hasn't been done before, so it's a fun research problem to work on (and if someone knows of existing tools for this please email me at wmoses at mit dot edu).
wsmoses | 5 years ago | on: Enzyme – High-performance automatic differentiation of LLVM
First of all they suffer from accuracy decay. For example if you were to do the standard f'(x) \approx [f(x+h)-f(x)]/h, you'd subtract two similar numbers and waste many bits of precision. In contrast if you were to generate the derivative function directly like below, you'd end up far more accurate.
double square(double x) { return x * x; }
double d_square(double x) { return __enzyme_autodiff(square, x); }
becomes
double d_square(double x) { return 2 * x; }
Secondly, from a performance perspective numerical differentiation is really slow -- especially for gradient computation. For example you would need to evaluate the function at once per argument in numeric differentiation to get the whole gradient. In contrast, reverse mode AD lets you generate the entire gradient in one call.
In addition to these generic issues, we illustrate in our paper how doing this at a compiler level allows for significant additional optimization (by removing unnecessary code from the forward pass, finding common expressions, etc).
These issues also are amplified as you make higher-order derivatives and so on.
wsmoses | 5 years ago | on: Enzyme – High-performance automatic differentiation of LLVM
wsmoses | 5 years ago | on: Enzyme – High-performance automatic differentiation of LLVM
Some more relevant links for the curious
Github: https://github.com/wsmoses/Enzyme
Paper: https://proceedings.neurips.cc/paper/2020/file/9332c513ef44b...
Basically the long story short is that Enzyme has a couple of interesting contributions:
1) Low-level Automatic Differentiation (AD) IS possible and can be high performance
2) By working at LLVM we get cross-language and cross-platform AD
3) Working at the LLVM level actually can give more speedups (since it's able to be performed after optimization)
4) We made a plugin for PyTorch/TF that uses Enzyme to import foreign code into those frameworks with ease!
wsmoses | 5 years ago | on: Enzyme: Cross-language Automatic differentiation for LLVM IR
A couple of relevant links for the curious
Github: https://github.com/wsmoses/Enzyme
Paper: https://proceedings.neurips.cc/paper/2020/file/9332c513ef44b...
Project: enzyme.mit.edu
Basically the long story short is that Enzyme has a couple of really interesting contributions:
1) Low-level AD IS possible and can be high performance
2) By working at LLVM we get cross-language and cross-platform AD
3) Working at the LLVM level actually can give more speedups (since it's able to be performed after optimization)
4) We made a plugin for PyTorch/TF that uses Enzyme to import foreign code into those frameworks with ease!