top | item 8656151

Distributing a Fully Connected Neural Network Across a Cluster

30 points| iamtrask | 11 years ago |iamtrask.github.io

6 comments

order

ajtulloch|11 years ago

How is this on the front page? This is a completely incoherent.

For anyone actually interested in some interesting techniques for multi-GPU DNN training, http://arxiv.org/pdf/1404.5997v2.pdf and references therein are probably a good start.

herewego|11 years ago

Your condescension here is entirely unnecessary. Surely someone as qualified as you could have provided a more thoughtful and encouraging comment.

iamtrask|11 years ago

i apologize for the verbosity and thickness. Happy to answer questions though. :)

dhaivatpandya|11 years ago

The exposition is not very clear. What exactly do you mean when you say "No edges will be communicated over the network, only half of the nodes."? I'm puzzled, because a few sentences later, you claim "The only network IO that would be required would be sending each edge value to its respective node in Q."; so the edge values are actually communicated?

From what I've understood, what you're suggesting is that for every node in a layer, you colocate the edge on the same machine?

iamtrask|11 years ago

Precisely! I highly encourage checking out the slide-deck for a graphical representation.

For every node in every other layer, I colocate the edge on the same machine. In this way, when a group of, say, 10 nodes in layer 1 are each sending a weighted message to a single node in layer 2... they can pre-combine their messages (weighted sum) and send only that value over the network. This happens for every node in the second layer, reducing network i/o (this is the first optimization).