top | item 44536597

(no title)

lhuser123 | 7 months ago

And make it more complicated than K8s

discuss

order

jliptzin|7 months ago

Not possible

vajrabum|7 months ago

The platforms I've seen live on top of kubernetes so I'm afraid it is possible. nvidia-docker, all the cuda libraries and drivers, nccl, vllm,... Large scale distributed training and inference are complicated beasties and the orchestration for them is too.