This is literally just the scaling laws, "Scaling laws predict the loss of a target machine learning model by extrapolating from easier-to-train models with fewer parameters or smaller training sets. This provides an efficient way for practitioners and researchers alike to compare pretraining decisions involving optimizers, datasets, and model architectures"https://arxiv.org/html/2410.11840v1#:~:text=Scaling%20laws%2....
No comments yet.