Great point, the sigmoid approximation works well for certain problems and that's in fact what I used in the exploratory papers that lead to this work. The downsides are the lack of a clear interpretation how the original program and its smooth counterpart are related, and the difficulty of controlling the degree of smoothing as programs get longer. What DiscoGrad computes has a statistical interpretation: it's the convolution of the program output with whatever distribution is used for smoothing, typically a Gaussian with a configurable variance.On top of that, if the program branches on random numbers (which is common in simulations), that suffices for the maths to work out and you get an estimate of the asymptotic gradient (for samples -> infinity) of the original program, without any artificial smoothing.
So in short, I do think it is slightly fancier :)
szvsw|1 year ago
frankling_|1 year ago
As an aside, the combination "known distributions + automation" is covered in the Julia world by stochasticAD (https://github.com/gaurav-arya/StochasticAD.jl).
usgroup|1 year ago
If so, does it scale for very branchy programs?
Do you have any comparisons to a Gibbs based approach for any of the use case examples?
frankling_|1 year ago
We've haven't done a direct comparison to MCMC approaches yet, but it's on the Todo list. My intuition is that MCMC will win out for problems where finding "just any" local optimum is not good enough.