You're still paying more than the GPU typically costs on an hourly basis to take advantage of their per-second billing... and if you don't have enough utilization to saturate an hourly rental then your users are going to be constantly running into cold starts which tend to be brutal for larger models.
Their A100 80GB is going more than what I pay to rent H100s: if you really want to save money, getting the cheapest hourly rentals possible is the only way you have any hope of saving money vs major providers.
I think people vastly underestimate how much companies like OpenAI can do with inference efficiency between large nodes, large batch sizes, and hyper optimized inference stacks.
BoorishBears|7 months ago
Their A100 80GB is going more than what I pay to rent H100s: if you really want to save money, getting the cheapest hourly rentals possible is the only way you have any hope of saving money vs major providers.
I think people vastly underestimate how much companies like OpenAI can do with inference efficiency between large nodes, large batch sizes, and hyper optimized inference stacks.
ivape|7 months ago
ivape|7 months ago