Yeah, I saw the work from @Sree_Harsha_N, though that accuracy plot on the Adam/SGD side of things is very untuned, it was about what one could expect from an afternoon of working with it, but as far as baselines go most people in the weeds with optimizers would recognize that it's pretty not-good for comparison (not to dump on the reproduction efforts).
Hence why I think it might be hard to accurately compare them, likely SGD and Adam/AdamW are going to have better potential top ends but are going to get more thrashed in public comparisons vs an optimizer that seems to perform more flatly overall. Aaron works at FAIR so I am assuming that he knows this, I reached out with some concerns on my end a little bit before he published the optimizer but didn't hear back either unfortunately.
danielhanchen|1 year ago
tysam_and|1 year ago
Hence why I think it might be hard to accurately compare them, likely SGD and Adam/AdamW are going to have better potential top ends but are going to get more thrashed in public comparisons vs an optimizer that seems to perform more flatly overall. Aaron works at FAIR so I am assuming that he knows this, I reached out with some concerns on my end a little bit before he published the optimizer but didn't hear back either unfortunately.