(no title)
sumo43 | 2 years ago
Here I'm assuming that Petal uses a large number of small, heterogenous nodes like consumer gpus. It might as well be something much simpler.
sumo43 | 2 years ago
Here I'm assuming that Petal uses a large number of small, heterogenous nodes like consumer gpus. It might as well be something much simpler.
brucethemoose2|2 years ago
For inference? Yeah, but its still better than nothing if your hardware can't run the full model, or run it extremely slowly.
I think frameworks like MLC-LLM and llama.cpp kinda throw a wrench in this though, as you can get very acceptable throughput on an IGP or split across a CPU/dGPU, without that huge networking penalty. And pooling complete hosts (like AI Horde) is much cheaper.
I'm not sure what the training requirements are, but ultimately throughput is all that matters for training, especially if you can "buy" training time with otherwise idle GPU time.