(no title)
Mockapapella | 9 months ago
> the majority of organizations achieve less than 70% GPU Allocation Utilization when running at peak demand — to say nothing of aggregate utilization. This is true even of sophisticated players, like the former Banana serverless GPU platform, which operated at an aggregate utilization of around 20%.
Saw this sort of thing at my last job. Was very frustrating pointing this out to people only for them to respond with ¯\_(ツ)_/¯. I posted a much less tactful article (read: rant) than the one by Modal, but I think it still touches on a lot of the little things you need to consider when deploying AI models: https://thelisowe.substack.com/p/you-suck-at-deploying-ai-mo...
charles_irl|9 months ago