top | item 42492221

(no title)

merizian | 1 year ago

Because of mup [0] and scaling laws, you can test ideas empirically on smaller models, with some confidence they will transfer to the larger model.

[0] https://arxiv.org/abs/2203.03466

discuss

order

No comments yet.