Just a heads up in case you didn't know, taking the Hessian over batches is indeed referred to as Stochastic Newton, and methods of this kind have been studied for quite some time. Inverting the Hessian is often done with CG, which tends to work pretty well. The only problem is that the Hessian is often not invertible so you need a regularizer (same as here I believe). Newton methods work at scale, but no-one with the resources to try them at scale seems to be aware of them.It's an interesting trick though, so I'd be curious to see how it compares to CG.
[1] https://arxiv.org/abs/2204.09266
[2] https://arxiv.org/abs/1601.04737
[3] https://pytorch-minimize.readthedocs.io/en/latest/api/minimi...
semi-extrinsic|1 month ago
conformist|1 month ago
Iirc eg GMRES is a popular Krylov subspace method.
throwaway198846|1 month ago
hodgehog11|1 month ago