(no title)
tadala
|
1 year ago
Everyone wants to use less compute to fit more in, but (obviously?) the solution will be to use more compute and fit less. Attention isn't (topologically) attentive enough. All these RNN-lite approaches are doomed, beyond saving costs, they're going to get cooked by some other arch—even more expensive than transformers.
falcor84|1 year ago
ithkuil|1 year ago