(no title)
abel_ | 4 years ago
But more anecdotally, the first applied neural network paper in 1989 by LeCunn has pretty much the same format as the GPT paper: a large neural network trained on a large dataset (all relative to the era). https://karpathy.github.io/2022/03/14/lecun1989/
It really just seems that there are a certain number of flops you need before certain capabilities can emerge.
No comments yet.