top | item 45507244

(no title)

lcnielsen | 4 months ago

Yeah, I did a lot of traditional optimization problems during my Ph. D., this type of expression pops up all the time with higher-order gradient-based methods. You rescale or otherwise adjust the gradient based on some system-characteristic eigenvalues to promote convergence without overshooting too much.

discuss

d3m0t3p|4 months ago

This sounds a lot like what the Muon / Shampoo optimizer do.

d3m0t3p|4 months ago

Would you have some literature about that ?

lcnielsen|4 months ago

There's a ton but it's pretty scattered. Yurii Nesterov's a big name, for example.