Good q. The method computes Hessian-inverse on a batch. When people say "Newton's method" they're often thinking H^{-1} g, where both the Hessian and the gradient g are on the full dataset. I thought saying "preconditioner" instead of "Newton's method" would make it clear this is solving H^{-1} g on a batch, not on the full dataset.
hodgehog11|1 month ago
It's an interesting trick though, so I'd be curious to see how it compares to CG.
[1] https://arxiv.org/abs/2204.09266 [2] https://arxiv.org/abs/1601.04737 [3] https://pytorch-minimize.readthedocs.io/en/latest/api/minimi...
semi-extrinsic|1 month ago
throwaway198846|1 month ago
MontyCarloHall|1 month ago
rahimiali|1 month ago
probably my nomenclature bias is that i started this project as a way to find new preconditioners on deep nets.