top | item 42492221 (no title) merizian | 1 year ago Because of mup [0] and scaling laws, you can test ideas empirically on smaller models, with some confidence they will transfer to the larger model.[0] https://arxiv.org/abs/2203.03466 discuss order hn newest No comments yet.
No comments yet.