top | item 43054917

(no title)

ryuuseijin | 1 year ago

My heart stopped for a moment when reading the title. I'm glad they haven't decided to axe GPUs, because fly GPU machines are FANTASTIC!

Extremely fast to start on-demand, reliable and although a little bit pricy but not unreasonably so considering the alternatives.

And the DX is amazing! it's just like any other fly machine, no new set of commands to learn. Deploy, logs, metrics, everything just works out of the box.

Regarding the price: we've tried a well known cheaper alternative and every once in a while on restart inference performance was reduced by 90%. We never figured out why, but we never had any such problems on fly.

If I'm using a cheaper "Marketplace" to run our AI workloads, I'm also not really clear on who has access to our customer's data. No such issues with fly GPUs.

All that to say, fly GPUs are a game changer for us. I could wish only for lower prices and more regions, otherwise the product is already perfect.

discuss

order

bottega_boy|1 year ago

I used the fly.io GPUs as development machines. For that, I generally launch a machine when I need it and scale it to 0 when I am finished. And this is what's really fantastic about fly.io - setting this up takes an hour... and the Dockerfile created in the process can also be used on any other machine. Here's a project where I used this setup: https://github.com/li-il-li/rl-enzyme-engineering

This is in stark contrast to all other options I tried (AWS, GCP, LambdaLabs). The fly.io config really felt like something worth being in every project of mine and I had a few occasions where I was able to tell people to sign up at fly.io and just run it right there (Btw. signing up for GPUs always included writing an email to them, which I think was a bit momentum-killing for some people).

In my experience, the only real minor flaw was the already mentioned embedding of the whole CUDA stack into your container, which creates containers that approach 8GB easily. This then lets you hit some fly.io limits as well as creating slow build times.