top | item 37550640

(no title)

LLMs are trained in parallel. The model weights and optimizer state are split over a number (possibly thousands) of accelerators.

The main bottleneck to doing distributed training like this is the communication between nodes.

discuss

No comments yet.