(no title)
kir-gadjello | 3 years ago
Would you mind adding a reference link to the source, so that other people could visit my blog? I'm just starting out with blogging, it would help me to get more readers and feedback on this draft. I hope to get it in much better shape in just a few days.
More posts are in the pipeline too!
BTW, I'm 99% sure the model uses some form of sparsity, because the competitive pressure for efficiency of inference is just too large. The real question here, of course, is precise engineering details of the sparsity method chosen. I suggest two promising methods as the most likely; it could be either one or both of them together.
adt|3 years ago
I always cite my sources, and you'll find a link to your page as usual.
I wanted to point you towards OpenAI's FIM 6.9B as well. Trained on 100B tokens (Chinchilla-aligned), it was announced just before GPT-4 allegedly started training. I didn't see anyone else talking about it, but maybe you could follow the rabbit trail even further, so to speak!
https://arxiv.org/abs/2207.14255