top | item 40483687

(no title)

I agree with that intuition. In our experience, it's easiest to see gains over other optimization techniques when the program is "branch-wise smooth and non-constant". Then, we get the full benefits of exact autodiff gradients "per branch", and our smoothing approach handles the branches. For SAT solving and other purely combinatorial problems, sufficiently accurate smoothing may indeed be too expensive. Also, in such problems, the average local minimum found via gradient descent may not always be that great. That said, we're still exploring where the limits really are.

discuss

eranation|1 year ago

Thank you for the responses! I'll be following your research for sure.