Think the massive increase in demand is due to mass inference of open source LLMs? Or is the transformer architecture driving mass inference of other models too?
You might be thinking too far into this. The biggest customers are bulk buyers that are either training on a private cluster (eg. Meta or OpenAI), or selling their rack-space to other businesses. These are the people that are paying money and increasing the demand for GPU hardware; what happens to the businesses they provide for almost doesn't even matter as long as they pay for the compute. The "driver" for this demand is the hype. If people were laser-focused on the best-value solution, then everyone would pay for OpenAI's compute since it's cheaper than GPU hosting.
The real root of the problem is that GPU compute is not a competitive market. The demand is less for GPUs and more for Nvidia hardware, because nobody else is shipping CUDA or CUDA-equivalent software. Thus the demand is artificially raised beyond whatever is reasonable since buyers aren't shopping in a reactive market. Basically the same story as what happened to Nvidia's hardware during the crypto mining rush.
> then everyone would pay for OpenAI's compute since it's cheaper than GPU hosting.
This is absolutely not true. The gap is narrowing as providers (Google, Anthropic, Deepseek) introduce cross request KV caching but it’s definitely not true for OAI (yet).
talldayo|1 year ago
The real root of the problem is that GPU compute is not a competitive market. The demand is less for GPUs and more for Nvidia hardware, because nobody else is shipping CUDA or CUDA-equivalent software. Thus the demand is artificially raised beyond whatever is reasonable since buyers aren't shopping in a reactive market. Basically the same story as what happened to Nvidia's hardware during the crypto mining rush.
qeternity|1 year ago
This is absolutely not true. The gap is narrowing as providers (Google, Anthropic, Deepseek) introduce cross request KV caching but it’s definitely not true for OAI (yet).