top | item 38358571 (no title) alanaan | 2 years ago great post. could you apply this same framework to optimize training as well? discuss order hn newest varunshenoy|2 years ago Slightly different set of trade-offs, but similar mental model. You always use large batch sizes (compute bound) and the bottleneck usually ends up communication between GPUs/nodes.
varunshenoy|2 years ago Slightly different set of trade-offs, but similar mental model. You always use large batch sizes (compute bound) and the bottleneck usually ends up communication between GPUs/nodes.
varunshenoy|2 years ago