top | item 36504269 (no title) mrcggl | 2 years ago A large H100 cluster (>10k GPUs) could likely train a LLM with 10x compute (FP8) of GPT-4, which was apparently trained on a mix of A100s and V100s. discuss order hn newest No comments yet.
No comments yet.