(no title)
muricula | 2 months ago
One solution is to reduce the scope of the problem -- you can train on a smaller less diverse dataset such as TinyStories which is a collection of 1 billion tokens of chatGPT generated children's stories. After about 40 hours, less than one weekend, you'll have a model which can generate mostly grammatical children's stories.
If you have a newer mac and/or an ultra chip you'll have more and faster GPU cores, and might be able to train on FineWeb or a similar, larger and more diverse dataset.
gpjt|2 months ago
I try to explain the Chinchilla paper in the post, but your favourite AI should be able to explain it well, and has the benefit that you can ask follow-up questions.