top | item 37862166

(no title)

iliane5 | 2 years ago

I think it's mostly the scale. Once you have a consistent user base and tons of GPUs, batching inference/training across your cluster allows you to process requests much faster and for a lower marginal cost.

discuss

No comments yet.