top | item 20227702

Linode GPU Instances

172 points| gr2020 | 6 years ago |blog.linode.com

71 comments

order

Who_me|6 years ago

Hey peeps full disclosure I work as one of Linode's RnD engineers. I want to try to get to as many of these as I can.

One of the biggest questions is why the Quadro RTX 6000? Few things:

1. Cost it has the same performance as the 8000. The difference is 8 more GB of RAM that comes at a steep premium. Cost is important to us as it allows us to be at a more affordable price point.

2. We have all heard or used the Tesla V100, and it's a great card. The biggest issue is that it's expensive. So one of the things that caught our eye is the RTX 6000 has a fast Single-Precision Performance, Tensor Performance, and INT8 performance. Plus the Quadro RTX supports INT4. https://www.nvidia.com/content/dam/en-zz/Solutions/design-vi... https://images.nvidia.com/content/technologies/volta/pdf/tes... Yes, these are manufactures numbers, but it caused us pause. As always, your mileage may vary.

3. RT cores. This is the first time (TMK) that a cloud provider is bringing RT cores into the market. There are many use cases for RT that have yet to be explored. What will we come up with as a community?!

Now with all that being said, there is a downside, FP64 aka double precision. The Tesla V100 does this very well, whereas the Quadro RTX 6000 does poorly in comparison. We think although those workloads are important, the goal was to find a solution that fits a vast majority of the use cases.

So is the marketing true to get the most out of MI/AI/Etc? Do you need a Tesla to get the best performance? Or is the Tesla starting to show its age? Give the cards a try I think you'll find these new RTX Quadros with Turning architecture are not the same as the Quadros of the past.

jamesblonde|6 years ago

If you really want low cost to compute for Deep Learning and you needs lots of compute and don't want to pay for V100s, then the AMD Vega R7 is the card for you. 700 dollars, 16GB Ram, 1TB of GPU bandwidth (higher than the V100!), works with Tensorflow (pip install tensorflow-rocm), and about 60% of the performance on resnet-50.FP64 is not fully gimped (it is halved, i think - so still quite good). Put lots of them in servers with PCI 4.0, and you can do great things. Here's a recent talk on it:

https://www.youtube.com/watch?v=neb1C6JlEXc

trsohmers|6 years ago

> The difference is 8 more GB of RAM that comes at a steep premium

This is incorrect. The RTX 6000 has 24GB of VRAM and is $4000, and the RTX 8000 has 48GB of VRAM (double the amount) and is $5500. Is it worth the price increase? For a lot of people I know it is.

Also, the RTX Titan is $2500 and is identical to the RTX 6000 (at the chip level) and also with 24GB of VRAM, with the only difference being software enabling of additional H.264/5 encoding features on the Quadro. Definitely not worth the cost increase, especially for anyone doing ML.

tntn|6 years ago

> This is the first time (TMK) that a cloud provider is bringing RT cores into the market.

Your knowledge is incomplete. T4 has been available in google cloud for many months.

sieabahlpark|6 years ago

Has linode improved their security intrusion and disclosure policy yet?

These are great improvements but are virtually worthless if linode didn't change their behavior.

picozeta|6 years ago

I would go with Hetzner: https://www.hetzner.com/dedicated-rootserver/ex51-ssd-gpu

GTX1080 for 100$ a month. Grantend, it is older, but it still works for DL. Let's say you do 10 experiments a month for ~20 hours. Thats 0.5$/hour and I don't think it is 3 times faster.

If you then want to do even more learning the price goes even down.

//DISCLAIMER: I do not work for them, but used it for DL in the past and it was for sure cheaper than GCP or AWS. If you have to do lots of experiments (>year) go with your own hardware, but do not underestimate the convenience of >100MByte/s if you download many big training sets.

mokus|6 years ago

For traditional floating point workloads, the RTX 6000 will probably not be 3x faster. For workloads that can use the tensor ops (integer matrix multiply, basically), the RTX 6000 may be as much as 10-100x faster.

tarasmatsyk|6 years ago

Agree, had exactly the same experience.

It is not a server card, however, it is much faster than any old AWS instances for 1k$/m (if you happen to be an AWS user and did not want to upgrade because of the price going up 3x) TBH, 100 bucks per month is free, while most of the researches do not have 1k$/m for a server, it is cheaper to buy hardware and put Linux on it.

There are of course other options and Linode is kinda late to the party, but I am happy they made this move.

svd4anything|6 years ago

How does data in/out work in practice with them? I see this 4 Tbit bandwidth but do you happen to know what that translates to and what happens if you exceed that?

Also check availability shows a 5 day wait current: “EX51-SSD-GPU for Falkenstein (FSN1): Due to very high demand for these server models, its current setup time is approximately up to 5 workdays.*” Or maybe there are other regions/dcs.

ksec|6 years ago

I thought you are not allowed to put Consumer Graphics Card in Datacenter?

Or is that prohibited in US only?

krick|6 years ago

It's a flat fee of $100/month, correct? What would be the best option if the amount of training you do is rather "occasional" (but simply using colab doesn't cut it anymore)?

icelancer|6 years ago

I have one of these instances. It's awesome and I recommend it highly.

m0zg|6 years ago

Still way too much money when a 2x 2080Ti comparably specced machine under my desk costs less than 2.5 months of their billing rate, and 4x 1080Ti servers in my garage cost about 1 month of their 4-GPU machine _and_ have more SSD storage. This pricing is totally insane, especially if not billed per-minute (which in Linode's case it is not) and if there are no cheaper preemptible/spot instances.

svd4anything|6 years ago

I’m starting to think one can adopt the simple rule of switch to a DIY build whenever there enough work to keep a GPU busy for 2 months, otherwise if the workload is intermittent then better strategy is leasing, especially considering the purchase cost/performance is constantly dropping.

trey-jones|6 years ago

What's the cost for power? Serious question, and I'm not suggesting that this cost should account for a large percentage of the price, but genuinely curious. If your GPUs are working every hour of the month for you, how much is it costing you in electricity?

ilaksh|6 years ago

Looks amazing. Linode has worked really well for me over the years.

One thing I noticed when recently trying to get a GPU cloud instance, the high core counts are usually locked until you put in a quota increase. Then sometimes they want to call you.

So I wonder if Linode will have to do that or if they can figure out another way to handle it that would be more convenient.

I also wonder if Linode could somehow get Windows on these? I know they generally don't do anything other than Linux though. My graphics project where I am trying to run several hundred ZX Spectrum libretro cores on one screen only runs on Windows.

keytarsolo|6 years ago

That pricing isn't too bad. They come with decent SSD storage too, which is key for the large datasets that make a GPU instance worthwhile.

Linode skews more towards smaller scale customers with many of their offerings so I think the GPUs here make sense. The real test will be how often they upgrade them and what they upgrade them too.

hmart|6 years ago

I love Linode support. There are cheaper places but I have my Key VPSs there.

dkobran|6 years ago

Interesting to see another cloud provider go with Quadro chips. NVIDIA repackages the same silicon under several different brands (GeForce, Quadro, GRID, Tesla) and we (https://paperspace.com) have found Quadro to offer the best price/performance value. Despite minor performance characteristics, such as FP16 support in the Tesla family, Quadros can run all of the same workloads eg graphics, HPC, Deep Learning etc. If you’re interested in a similar instance for less $/hr, check out the Paperspace P6000.

minimaxir|6 years ago

Huh. Given that cheap cloud GPUs are nowadays sought for training AI, launching with a Workstation-oriented GPU is an odd product decision.

ksec|6 years ago

Are there any difference? Seems to support CUDA as well, I don't see anything wrong with it.

Also seems to be a lot cheaper than AWS counterpart.

thenightcrawler|6 years ago

Isn't AWS cheaper?

edit: could be wrong thought I read of AWS being .65 dollars an hour for deep learning GPU use. edit2: Did a quick look, the .65 dollars doesn't include the actual instance, so its around 1.8 an hour on the low end, I think this cheaper.

perennate|6 years ago

p2.xlarge comes with an NVIDIA Tesla K80 GPU for $0.90/hr, but this is now an "old" GPU and the RTX Quadro 6000 should have much higher performance (but I was unable to find any machine learning benchmarks).

p3.2xlarge has NVIDIA Tesla V100 GPU which is NVIDIA's most recent deep learning GPU, but it's $3.06/hr.

That said, AWS is among the most expensive providers if you just need a deep learning GPU (but obviously AWS offers a lot of other useful things). For example, OVH Public Cloud has Tesla V100 for $2.66/hr. And comparable NVIDIA GPUs that are not "datacenter-grade" should be even cheaper; AWS, GCP, Azure, etc. are unable to offer them because of contracts when they buy e.g. the Tesla V100.

tarasmatsyk|6 years ago

It depends, for full-time usage, it is a bit more expensive, I think it is a matter of a few hundred, probably less. We've happily migrated from AWS as only one GPU instance cost us near 1k/m. BTW, the newest and the only available GPU instances now should be better RTX6000 even being more expensive.

coherentpony|6 years ago

Does anybody know if there are any cloud instances with AMD GPUs?

jamesblonde|6 years ago

GPUEater do, i think. Right now, though, they are a viable option for an on-premise use case where you have a budget of say a $100k dollars or more and need a huge amount of compute and have larger models to train. The Vega R7 gives you 16GB Ram (11GB in the 2080Ti) and is just slightly lower performance than the 2080Ti (322 vs 302 images/sec for resnet-50 from here: https://www.youtube.com/watch?v=neb1C6JlEXc ). And you have servers with PCI-4.0 support, so that distributed training scales (yes, 2080-Ti supports nvlink, but nvlink servers cost way more). Simple math example. A PCI-4.0 server with 256GB Ram and 8xVegaR7 should cost around $10K. With a couple of switches and racks, you can get 100s of GPUs for just a couple of hundred thousand dollars (note, only 2 GPU servers per rack for now is normal, otherwise you have to buy non-commodity racks with high power draw).

azinman2|6 years ago

What would you want it for over nvidia in the cloud?

zonidjan|6 years ago

Oh, hey! It only adds 1/5th of the GPU's purchase price.

MuffinFlavored|6 years ago

Can these be used for crypto mining at any level of efficiency? I was able to mine GRLC back in the day on AWS spot instances at a VERY mild degree of profitability.

elabajaba|6 years ago

Doubtful, since these are just fully unlocked TU102 GPUs (same as the Titan RTX, 2080ti is the same TU102 GPU but partially locked at 4352 cores vs 4608 for the Quadro RTX 6000/8000 and Titan RTX). If you could be profitable with this at $1000/month then people would be flocking out to buy 2080tis for $1100 and getting 90-95% of the hashrate.

tootahe45|6 years ago

They wouldn't be available if they were profitable for that. Providers usually make you do extra verification to use these instances because people were at a time using them for that, not because it was profitable, but because they used stolen cloud accounts/cards.

walrus01|6 years ago

not really, most cryptocurrency is at the stage where the only thing effective is a combination of custom ASICs and nearly free electricity. About twelve months ago I looked into mining ethereum with state of the art GPUs and it would not have had a reasonable ROI unless I was literally paying $0.00 per kWh. And that was before its value per coin dropped a lot.