top | item 40349611

(no title)

sveinek | 1 year ago

There is probably a simple answer to this question, but why isn't it possible to use a decentralized architecture like in crypto mining to train models?

discuss

rcxdude|1 year ago

It's not a task which benefits much from dividing it into lots of small work units that all get processed in parallel without much communication between the nodes. It's naturally almost the complete opposite: it wants very high bandwidth between all the compute units, because each iteration of the training is calculating the derivative of and then updating all the weights of the network. Splitting it up only slows it down: even if you were to distribute training amongst 10x the compute nodes each of which was 10x faster, if your bandwidth drops to even 1/2 you're gonna lose out. This is why all the really big models need a lot of very tightly integrated hardware.

xyproto|1 year ago

Just like brains.

andai|1 year ago

Can you copy a neural network, train each copy on a different part of the dataset, and merge them back together somehow?

mirekrusin|1 year ago

No. Training is offset relative to starting point. If you distribute it from same point you'll have bunch of unrelated offsets. It has to be serial - output state of one training is input state of the next.

If you could do it, we'd already have SETI like networks for AI.

magicalhippo|1 year ago

As mentioned this is difficult. AFAIK the main reason is that the power of neural nets come from the non-linear functions applied at each node ("neuron"), and thus there's nothing like the superposition principle[1] to easily combine training results.

The lack of superposition means you can't efficiently train one layer separately from the others either.

That being said, a popular non-linear function in modern neural nets is ReLU[2] which is piece-wise linear, so perhaps there's some cleverness one can do there.

[1]: https://en.wikipedia.org/wiki/Superposition_principle

[2]: https://en.wikipedia.org/wiki/Rectifier_(neural_networks)

bilbo0s|1 year ago

There are a lot of issues with federated learning.

Really depends on your problem, but in practice, the answer is usually "no".

mendigou|1 year ago

There are multiple ways to train in parallel, and that's one of them:

https://pytorch.org/tutorials//distributed/home.html

LeoPanthera|1 year ago

Wouldn't every single participant need a copy of the entire training set?

samus|1 year ago

That's the next big problem. And there need to be mechanisms to ensure that the network is not poisoned with undesirable input.

CaptainOfCoit|1 year ago

https://scholar.google.com/scholar?q=distributed+training