New AMD EPYC-based Compute Engine family, now in beta

[+] boulos|6 years ago|reply

Disclosure: I work on Google Cloud.

This has come up a few times, so I wanted to reiterate that these are the Zen2/Rome parts not the first generation “Naples” parts. We didn’t bother launching Naples for GCE, because (as you can see) Rome is a huge step up.

[+] Bluecobra|6 years ago|reply

Are you using a custom CPU from AMD? I spun up a N2D instance and it's showing up an "Epyc 7B12" and I can't find any details about this CPU anywhere.

[+] mdasen|6 years ago|reply

Since people from Google Cloud are likely here, one thing I'd like to ask/talk about: are we getting too many options for compute? One of the great things about Google Cloud was that it was very easy to order. None of this "t2.large" where you'd have to look up how much memory and CPU that it has and potentially how many credits you're going to get per hour and such. I think Google Cloud is still easier, but it's getting harder to know what is the right direction.

For example, the N2D instances are basically the price of the N1 instances or even cheaper with committed-use discounts. Given that they provide 39% more performance, should the N1 instances be considered obsolete once the N2D exits beta? I know that there could be workloads that would be better on Intel than AMD, but it seems like there would be little reason to get an N1 instance once the N2D exits beta.

Likewise, the N2D has the basically same sustained-use price as the E2 instances (which only have the performance of N1 instances). What's the point of E2 instances if they're the same price? Shouldn't I be getting a discount given that Google can more efficiently use the resources?

It's great to see the improvements at Google Cloud. I'm glad to see lower-cost, high-performance options available. However, I guess I'm left wondering who is choosing what. I look at the pricing and think, "who would choose an N1 or N2 given the N2D?" Sure, there are people with specific requirements, but it seems like the N2D should be the default in my mind.

This might sound a bit like complaining, but I do love how I can just lookup memory and CPU pricing easily. Rather than having to remember name-mappings, I just choose from one of the families (N1, N2, E2, N2D) and can look at the memory and CPU pricing. It makes it really simple to understand what you're paying. It's just that as more families get added and Google varies how it applies sustained-use and committed-use discounts between the families, it becomes more difficult to choose between them.

For example, if I'm going for a 1-year commitment, should I go with an E2 at $10.03/vCPU or an N2D at $12.65/vCPU. The N2D should provide more performance than the 26% price increase, yes? Why can't I get an EPYC based E-series to really drive down costs?

Again, I want to reiterate that Google Cloud's simpler pricing is great, but complications have crept in. E2 machines don't get sustained-use discounts which means they're really only valuable if you're doing a yearly commitment or non-sustained-use. The only time N1 machines are cheaper is in sustained-use - they're the same price as Intel N2 machines if you're doing a yearly commitment or non-sustained-use. Without more guidance on performance differences between the N2D and N2, why should I ever use N2? I guess this is a bit of rambling to say, "keep an eye on pricing complexity - I don't like spending a lot of time thinking about optimizing costs".

[+] boulos|6 years ago|reply

Disclosure: I work on Google Cloud (and really care about this).

The challenge here is balancing diverse customer workloads against the processor vendors. Historically, at Google, we just bought a single server variant (basically) because almost all code is expected to care primarily about scale-out environments. That made the GCE decision simple: offer the same hardware we build for Google, at great prices.

The problem is that many customers have workloads and applications that they can’t just change. No amount of rational discounting or incentives makes a 2 GHz processor compete with a 4 GHz processor (so now, for GCE, we buy some speedy cores and call that Compute Optimized). Even more strongly, no amount of “you’re doing it wrong” actually is the right answer for “I have a database on-prem that needs several sockets and several TB of memory” (so, Memory Optimized).

There’s an important reason though that we refer to N1, N2, N2D, and E2 as “General purpose”: we think they’re a good balanced configuration, and they’ll continue to be the right default choice (and we default to these in the console). E2 is more like what we do internally at Google, by abstracting away processor choice, and so on. As a nit to your statement above, E2 does flip between Intel and AMD.

You should choose the right thing for your workloads, primarily subject to the Regions you need them in. We’ll keep trying to push for simplicity in our API and offering, but customers really do have a wide range of needs, which imposes at least some minimum amount of complexity. For too long (probably) we attempted to refuse, because of complexity, both for us and customers. Feel free to ignore it though!

[+] theevilsharpie|6 years ago|reply

> For example, the N2D instances are basically the price of the N1 instances or even cheaper with committed-use discounts. Given that they provide 39% more performance, should the N1 instances be considered obsolete once the N2D exits beta?

As the name implies, N2 is a newer generation than N1. I don't think Google has announced any official N1 deprecation timeline, but that product line clearly has an expiration date.

The more direct comparison would be Intel's N2 instances, vs. AMD's N2D instances. In that case, N2 instances are likely faster on a per-core basis and support some Intel-specific instructions, whereas N2D instances are substantially less expensive.

> Again, I want to reiterate that Google Cloud's simpler pricing is great, but complications have crept in.

That seems like an unavoidable consequence of maturing as a product offering: more options means more complexity. If Google tried to streamline everything and removed options to keep things simple, they'd have another cohort of users (including myself) screaming that the product doesn't meet their needs.

I suppose a "Help Me Choose" wizard that provides some opinionated guidance can be helpful to onboarding new users, but otherwise, I don't see how Google can win here.

[+] 013a|6 years ago|reply

Realistically; a typical hyperscale cloud provider has tens/hundreds of millions of dollars invested into a specific CPU platform. It makes very little sense to just throw it out chasing some idealism like "simplicity"; the world is not simple.

You can be like Digitalocean and just say "You want a CPU core, you get a CPU core, no guarantee what it'll be". Most enterprises won't buy this. But, I think there's some interesting use-cases where even a hyperscale provider targeting enterprises could (and do) utilize this; not on an EC2-like product, but as the infrastructure for something like Lambda, or to run the massive number of internal workloads necessary to power highly-managed cloud workloads.

[+] scardycat|6 years ago|reply

Customers like having choices. Enterprises typically will "certify" one config and would like to stay on that till they absolutely need to move to something else.

[+] kccqzy|6 years ago|reply

I think for enterprise businesses, people just love choices. I don't know about GCP but I do know about highly paid AWS consultants producing detailed comparisons between instance types and make recommendations for companies to "save money." Or maybe some people just like the thrill of using spreadsheets and navigating the puzzle of pricing.

[+] lallysingh|6 years ago|reply

They still own and have to pay for the old hardware.

Customers rarely have the time/energy/expertise to continuously reoptimize their cloud usage.

[+] TuringNYC|6 years ago|reply

Different chipsets may have slightly different capabilities. For example, I’ve been using NVIDIA RAPIDS recently. Not all NVIDIA cards support this particular framework’s needs. Sometimes you need to specifically direct customer installations to a specific type of card or chipset.

[+] unknown|6 years ago|reply

[deleted]

[+] outworlder|6 years ago|reply

> are we getting too many options for compute

As compared to what, Azure? :)

[+] rb808|6 years ago|reply

Cloud is a poor metaphor now. Its really a messy bunch of constellations where some people think they can see a pretty picture, most people just see random dots.

[+] znpy|6 years ago|reply

tl;dr: I find the amplitude of google cloud's offering confusing, so i think there should be less of it.

[+] tpetry|6 years ago|reply

Now its getting really interesting: In the end you have to compare pricing for a vCore (which is a thread on a cpu) with per-thread performance on AMD vs. Intel. Does anyone know a benchmark like this? Epyc Processors are most often tested on heavy parallelizable tasks and not strictly single thread tasks.

[+] noahl|6 years ago|reply

I don't work there any more, but back when I was at Google Cloud we developed our own benchmarking suite, partly to answer questions like this. It's open source, you can run it too: https://github.com/GoogleCloudPlatform/PerfKitBenchmarker

[+] api|6 years ago|reply

From what I've seen AMD's recent chips beat (sometimes outright destroy) Intel on multithreaded tasks, but on single-threaded tasks it's still a bit of a toss up and depends on the work load. Intel seems to still come out ahead on some heavy numeric and scientific type work loads, especially if vector instructions are used. The differences are not huge though, and AMD solidly wins on price/performance even in cases where it's a bit slower in absolute performance for single threaded work.

At this point Intel literally only makes sense if you have one of those single threaded work loads where it still excels and you absolutely must have the fastest single thread performance.

[+] boulos|6 years ago|reply

Disclosure: I work on Google Cloud.

Performance is a tricky, multi-dimensional thing, so there are many benchmarks that try to map to different workloads. For example, specint is often used for exactly your "single threaded task" benchmark, but if what you work on is numerical computing, you mostly don't care (you want specfp at the least and even that is bad).

Some people seem to really like Coremark these days. Others like specintrate. What kind of application do you care about? I'd guess plenty of folks here can provide a better estimate with that info.

[+] t3rabytes|6 years ago|reply

I did some rudimentary testing with AMD vs Intel on AWS recently and found that AMD lacked enough in single threaded perf that it meant they weren’t worth the savings for our workloads (Rails apps).

[+] boulos|6 years ago|reply

Disclosure: I work on Google Cloud.

cvallejo is the PM, so ask her anything!

[+] privateSFacct|6 years ago|reply

Does AWS have a comparable offering? I haven't seen anything on EPYC - congrats to GCP for moving quickly. I'm mostly AWS based currently.

[+] wmf|6 years ago|reply

Still "in the works": https://aws.amazon.com/blogs/aws/in-the-works-new-amd-powere...

[+] wolf550e|6 years ago|reply

See https://aws.amazon.com/about-aws/whats-new/2019/04/amazon-ec...

[+] Cyclenerd|6 years ago|reply

Geekbench Multi Core Benchmarks:

1. n2-standard-16 (Intel Cascade Lake): https://browser.geekbench.com/v5/cpu/1257619

2. n2d-standard-16 (AMD EPYC): https://browser.geekbench.com/v5/cpu/1257340

3. n1-standard-16 (Intel Skylake): https://browser.geekbench.com/v5/cpu/1257420

Intel Cascade Lake is with a very small lead on 1st place. But AMD is with three-year commitment the cheapest option. Great to have a choice. Thx @cvallejo

[+] tkinz27|6 years ago|reply

Its really great to see more AMD options for cloud instances. Now I'm just waiting for more ARM architecture options. Not having to cross compile code (a1/m6g AWS instance types) has been very useful in my day to day job.

[+] m0zg|6 years ago|reply

Well, they may be hypothetically "available" in us-central1, but they don't show up for me. All I see is "Cascade Lake powered" N2.

[+] pier25|6 years ago|reply

What are the implications? Higher perf and/or lower price?

[+] kart23|6 years ago|reply

Looks to be about $5/Month cheaper, based on this page:

https://cloud.google.com/compute/all-pricing#n2_machine_type...

Jeez, I didn't realize how expensive cloud compute was. I always wondered why my school still has a datacenter. Having your own servers still makes sense for a lot of orgs.

[+] wmf|6 years ago|reply

Basically. This is the best server processor.

[+] dsign|6 years ago|reply

Can those instances be used in GKE node pools?

[+] leemoonsoo|6 years ago|reply

When gVisor is enabled, hyper-threading is disabled (https://cloud.google.com/kubernetes-engine/docs/concepts/san...) and GKE cluster shows only half the CPU as available resources.

Does n2d instance is considered safe from CPU vulnerability and safely enable hyper-threading when used with gVisor?

[+] boulos|6 years ago|reply

Disclosure: I work on Google Cloud.

Should be, once the rollouts complete. So once you can see N2D types in the Console for your project, I think it'll just flow naturally to GKE.

[+] ensacco|6 years ago|reply

What's the intent / timeline for N2D in other regions, e.g. us-west1?

[+] benbro|6 years ago|reply

AMD EPYC has SHA hardware accelerations but in my test "openssl speed -evp sha1" is slower than "openssl speed sha1". Any idea why?

[+] carbocation|6 years ago|reply

Any idea when these will be available on the genomics pipeline API (now "Cloud Life Sciences" API)?

[+] lallysingh|6 years ago|reply

What's the topology of these machines? Dual socket 64c chips with some reserved (or disabled)?

[+] wmf|6 years ago|reply

They don't say that but that's the only way to provide 224 threads.

[+] tempsy|6 years ago|reply

AMD's stock is wild. It was around $2 just a few years ago and has been on a non-stop trend up to almost $60 today.

[+] sdesol|6 years ago|reply

They really are doing something disruptive. I can't quite remember if this is correct (it has been a while since I last studied business), but in business there is a "blue ocean strategy". The basic premise is, if you can provide a product for half the price, with the twice the value, you will destroy the incumbent.

What AMD is doing is really insane in my opinion. I'm not sure if they are pricing their processors low on purpose and/or if they have found a way to manufacture cheaper and/or Intel was screwing consumers with their pricing since they were so dominate.

No matter what, AMD is able to provide something that is measurably better and significantly cheaper than the incumbent, and if the blue ocean strategy holds, they should become the new incumbent in the near future.

[+] overcast|6 years ago|reply

Wild is to put it mildly, the stock is historically extremely volatile. If past performance predicts anything, it could potentially tank in the next couple of years, like it did the last few times it shot up. Obviously they are doing some great work, but I wouldn't go all in on them for the long haul.

[+] 867-5309|6 years ago|reply

no pricing mentioned

I was surprised to discover the other day that one of my VPSs had been upgraded from 1 old Xeon 26XX core to 2 EPYC cores. other stats unmetered 10Gb/s up/down, low latency A'dam location, 2GB RAM, SSD.. it even outperformed my i7-8700T in a single-core openSSL benchmark. most importantly it costs €3/mo

I really can't see google competing with that

[+] e12e|6 years ago|reply

I see 10gps listed at USD 569/month?

https://www.scaleway.com/en/virtual-instances/general-purpos...

[+] Dontrememberit|6 years ago|reply

Which VPS?

139 comments