top | item 17218906

(no title)

cromulen | 7 years ago

I don't know about CNTK, but Tensorflow and I think PyTorch don't have _good_ distributed training.

They use a distributed training model that utilizes parameter servers, which scales nowhere near Horovod's mpi solution.

Even for single-machine-multi-gpu solutions, only now in Tensorflow 1.8 is pure tensorflow as fast as Horovod with it's estimator MirroredStrategy. If you watch Tensorflow dev days 2018, the devs say they're working on bringing something like Horovod to pure Tensorflow

discuss

No comments yet.