top | item 37550640

(no title)

necroforest | 2 years ago

LLMs are trained in parallel. The model weights and optimizer state are split over a number (possibly thousands) of accelerators.

The main bottleneck to doing distributed training like this is the communication between nodes.

discuss

order

No comments yet.