top | item 36504269

(no title)

mrcggl | 2 years ago

A large H100 cluster (>10k GPUs) could likely train a LLM with 10x compute (FP8) of GPT-4, which was apparently trained on a mix of A100s and V100s.

discuss

No comments yet.