top | item 40504084

Show HN: Chinchilla Scaling Laws Are Not Universal

1 points| KhoomeiK | 1 year ago |github.com

Hey HN! Chinchilla (DeepMind 2022) tells us that when we scale up our language model training, we should scale the parameters and data equally.

Over the last several months I've been hacking on a research project to determine if the optimal compute allocation (scaling law) for training an LLM is sensitive to training data complexity. I found that as data complexity increases, you need even more data than Chinchilla suggests!

I released the preprint just yesterday: https://arxiv.org/abs/2405.16684

discuss

No comments yet.