top | item 43168312

Muon Is Scalable for LLM Training

5 points| renonce | 1 year ago |github.com

1 comment

yorwba|1 year ago

For people who want to know more about the Muon optimizer: https://kellerjordan.github.io/posts/muon/