top | item 40483448

(no title)

frankling_ | 1 year ago

Yep, that's exactly it. The smoothness can either come from randomness in the program itself (then the objective function is asymptotically smooth and DiscoGrad estimates the gradient of that smooth function), or the smoothness can be introduced artificially.

As an example, the very first thing we looked into was a transportation engineering problem, where the red/green phases of traffic lights lead to a non-smooth optimization problem. In essence, in that case we were looking for the "best possible" parameters for a transportation simulation (in the form of a C++ program) that's full of branches.

discuss

order

eranation|1 year ago

Awesome, thank you! Interesting. The first thing that came to my mind regarding the traffic light example is any problem that reduces to a SAT solver, I assume they are some that are clearly un-smoothable in polynomial time otherwise this will have interesting consequences...

frankling_|1 year ago

I agree with that intuition. In our experience, it's easiest to see gains over other optimization techniques when the program is "branch-wise smooth and non-constant". Then, we get the full benefits of exact autodiff gradients "per branch", and our smoothing approach handles the branches. For SAT solving and other purely combinatorial problems, sufficiently accurate smoothing may indeed be too expensive. Also, in such problems, the average local minimum found via gradient descent may not always be that great. That said, we're still exploring where the limits really are.

epr|1 year ago

Obviously, some of this branching, etc. is not differentiable. Is it doing finite differences?

pthr|1 year ago

Looks interesting, thanks for posting and commenting here! Does it in any way attempt to find the global minimum, or will it merely enhance the decent to any local minimum of the cost function?

justinnk|1 year ago

(I am one of the authors) Generally speaking, the latter. The purpose of DiscoGrad is just to deliver useful gradients. These provide information about the local behavior of the cost function around the currently evaluated point to an optimizer of your choice, e.g., gradient descent. Interestingly, the smoothing and noise can sometimes prevent getting stuck in undesired (shallow) local minima when using gradient descent.