(no title)
ddelnano | 4 months ago
The authors mention that NCCL and Ray initialization were too slow (see quote below), but from the description it sounds like they’ve reimplemented a layer that’s increasingly being standardized by frameworks like nixl and uccl.
> Distributed executor: Inference engines support model parallelism via distributed executors (e.g., Ray [32] and NCCL [9]), whose initialization takes tens of seconds.
No comments yet.