top | item 46927177 (no title) maz1b | 24 days ago AFAIK, they don't have any deals or partnerships with Groq or Cerebras or any of those kinds of companies.. so how did they do this? discuss order hn newest tcdent|24 days ago Inference is run on shared hardware already, so they're not giving you the full bandwidth of the system by default. This most likely just allocates more resources to your request. unknown|24 days ago [deleted] hendersoon|24 days ago Could well be running on Google TPUs. rvz|23 days ago The models are running on Google TPUs.
tcdent|24 days ago Inference is run on shared hardware already, so they're not giving you the full bandwidth of the system by default. This most likely just allocates more resources to your request. unknown|24 days ago [deleted]
tcdent|24 days ago
unknown|24 days ago
[deleted]
hendersoon|24 days ago
rvz|23 days ago