(no title)
curious_cat_163 | 8 months ago
Yes, the more recent generation of GPUs optimize for attention math. But they are still fairly "general-purpose" accelerators as well. So when I see papers like this (interesting idea, btw!), my mental model for costs suggests that the CapEx to buy up the GPUs and build out the data centers would get re-used for this and 100s of other ideas and experiments.
And then the hope is that the best ideas will occupy more of the available capacity...
No comments yet.