top | item 37862166 (no title) iliane5 | 2 years ago I think it's mostly the scale. Once you have a consistent user base and tons of GPUs, batching inference/training across your cluster allows you to process requests much faster and for a lower marginal cost. discuss order hn newest No comments yet.
No comments yet.