(no title)
tysam_and | 1 year ago
Hence why I think it might be hard to accurately compare them, likely SGD and Adam/AdamW are going to have better potential top ends but are going to get more thrashed in public comparisons vs an optimizer that seems to perform more flatly overall. Aaron works at FAIR so I am assuming that he knows this, I reached out with some concerns on my end a little bit before he published the optimizer but didn't hear back either unfortunately.
danielhanchen|1 year ago
tysam_and|1 year ago
96 legitimately is pretty hard, i struggled doing it even in 2 minutes, so seeing it in 45 seconds is crazy. definitely gets exponentially harder for every fraction of a percent, so i think that's a pretty big achievement to hit :D