(no title)
marcusf | 3 years ago
Now, we could see another model architecture than the current reigning transformer architecture upend this (much work is ongoing on breaking the quadratic term in the transformer that computationally bounds its performance - an example is the Hyena paper that was published just the other day).
Biggest computer and most data wins is still the paradigm here.
No comments yet.