I'm in the business of mi300x. This comment nails it.
In general, the $2 GPUs are either PE venture losing money, long contracts, huge quantities, pcie, slow (<400G) networking, or some other limitation, like unreliable uptime on some bitcoin miner that decided to pivot into the GPU space and has zero experience on how to run these more complicated systems.
Basically, all the things that if you decide to build and risk your business on these sorts of providers, you "get what you pay for".
I agree with you, but as the article mentioned, if you need to finetune a small/medium model you really don't need clusters. Getting a whole server with 8/16x H100s is more than enough. And I also believe with the article when it states that most companies are finetuning some version of llama/open-weights models today.
latchkey|1 year ago
In general, the $2 GPUs are either PE venture losing money, long contracts, huge quantities, pcie, slow (<400G) networking, or some other limitation, like unreliable uptime on some bitcoin miner that decided to pivot into the GPU space and has zero experience on how to run these more complicated systems.
Basically, all the things that if you decide to build and risk your business on these sorts of providers, you "get what you pay for".
jsheard|1 year ago
We're not getting Folding@Home style distributed training any time soon, are we.
marcyb5st|1 year ago
pico_creator|1 year ago
Is it big enough for foundation model training from scratch = ~$3+ Otherwise it drops hard
Problem is "big enough" is a moving goal post now, what was big, becomes small
swyx|1 year ago
ofcourse it woudl still cost a lot to do... but if the difference is $2/hr vs $4.49/hr then there's some size where it makes sense