top | item 41300458

(no title)

jl2718 | 1 year ago

I think you need higher algorithmic intensity. Gradient descent is best for monolithic GPUs. There could be other possibilities for layer-distributed training.

discuss

No comments yet.