AWS Graviton 3 Instances

[+] ghshephard|4 years ago|reply

So - for those deeper into security - is this useful?

"Graviton3 processors also include a new pointer authentication feature that is designed to improve security. Before return addresses are pushed on to the stack, they are first signed with a secret key and additional context information, including the current value of the stack pointer. When the signed addresses are popped off the stack, they are validated before being used. An exception is raised if the address is not valid, thereby blocking attacks that work by overwriting the stack contents with the address of harmful code. We are working with operating system and compiler developers to add additional support for this feature, so please get in touch if this is of interest to you"

[+] spijdar|4 years ago|reply

Very useful, depending on the implementation and potential trade-offs. If the performance is good, this is a nice extra layer that makes return-oriented programming more difficult. Combined with NX bits, it really raises the difficulty in developing/using many types of exploits.

(it's not impossible to bypass, I'm vaguely aware it's been done on Apple's new chips that implement a similar (the same?) ARM extension, but there's no perfect security)

[+] ComputerGuru|4 years ago|reply

These slides should help: https://llvm.org/devmtg/2019-10/slides/McCall-Bougacha-arm64...

[+] ljhsiung|4 years ago|reply

Pointer authentication has been around for several years already. As with many things in hardware, though, it takes time for the software ecosystem around it to mature. Still, I've found it to be quite influential.

Here are a couple "real world" examples--

Project Zero had a blogpost about some of the weaknesses on the original Pointer Auth spec [0], and even had a follow up [1].

Here is an example of what some mitigation might look like, showing how gets(), which is a classically trivially vulnerable primitive, becomes not-so-trivial (but still feasible enough to do in a blogpost, obviously) [2].

Cost-wise, in terms of both hardware and software, it's rather cheap. The hardware to support this isn't too expensive, about on par with a multiplier. On the software end, like I said, it's taken some time to mature and gotten to a pretty good state IMO, with basically all compilers providing simple usage since 2019-- just turn on a flag!

ARM also did a performance vs. ROP gadget reduction analysis [3]. The takeaway is, as others have mentioned, while it doesn't completely mitigate, it does heavily increase the complexity for rather cheap.

In fact, I'm rather annoyed Amazon didn't include this feature on Graviton2, and to claim it as new or innovative on their end feels just like marketing speak. Any CPU that claims to be ARMv8.5-a compliant *must* have this feature, and that's been around for quite a few years now.

[0]: https://googleprojectzero.blogspot.com/2019/02/examining-poi...

[1]: https://bazad.github.io/presentations/BlackHat-USA-2020-iOS_...

[2]: https://blog.ret2.io/2021/06/16/intro-to-pac-arm64/

[3]: https://developer.arm.com/documentation/102433/0100/Applying...

[+] MaxBarraclough|4 years ago|reply

This isn't something I know a lot about but it sounds like the idea of a shadow stack [0] but implemented with crypto. See also Intel CET (for Control-flow Enforcement Technology). [1]

[0] https://en.wikipedia.org/wiki/Shadow_stack

[1] https://www.intel.com/content/www/us/en/developer/articles/t...

[+] unknown|4 years ago|reply

[deleted]

[+] tyingq|4 years ago|reply

It would take the heat off for mitigating buffer overflow CVEs in a rushed way. There are many of those that give remote code execution, so typically a frenzied patching exercise. A little more time to do the patching in a more deliberate way would be nice.

[+] staticassertion|4 years ago|reply

I've heard very promising things about pointer authentication.

[+] ksec|4 years ago|reply

>Graviton3 will deliver up to 25% more compute performance and up to twice as much floating point & cryptographic performance. On the machine learning side, Graviton3 includes support for bfloat16 data and will be able to deliver up to 3x better performance.

>First in the cloud industry to be equipped with DDR5 memory.

Quite hard to tell whether this is Neoverse V1 or N2. Since the description fits both . But this SVE extensions will move a lot of workload that previously wont suitable for Graviton 2

Edit: Judging from Double floating point performance it should be N2 with SVE2. Which also means Graviton 3 will be ARMv9 and on 5nm. No wonder why TSMC doubled their 5nm expansion spending. It will be interesting to see how they price G3 and G2. And much lowered priced G2 instances will be very attractive.

[+] Uehreka|4 years ago|reply

I’m going to look up what SVE extensions are, but before I do, how much work (as a proportion of all work done on EC2) couldn’t be done on G2? I generally go off the assumption that most EC2 instances are hosting web servers and database servers, along with a handful, relatively, of CI servers and perhaps a sprinkling of video transcoders, 3D renderers and ML trainers. How much of that work can’t be done with the operations supported by G2? Is it just the long tail?

[+] ksec|4 years ago|reply

Cant edit it now, It should be V1, not N2.

[+] qbasic_forever|4 years ago|reply

Where are Google and Azure with ARM instances? It's been nothing but crickets for years now... this is starting to get silly that their customers can't at least start getting workloads on a different architecture, nevermind get better performance per dollar, etc. too. The silence is deafening.

[+] ksec|4 years ago|reply

>The silence is deafening

Remember Amazon brought Annapurna Labs in 2015. And only released their first Graviton instances in 2018. The lead time for a Server CPU product is at least a year even when you have blueprints. That is ignoring fab capacity booking and many other things like testing. And without scale ( AWS is bigger than GCP and Azure combined ) it is hard to gain competitive advantage ( which often delays management decision making ).

I think you should see Azure and GCP ARM offering in late 2022. Marvel's exit statement on ARM server SoC pretty much all but confirmed Google and Microsoft are working on their own ARM offering.

[+] baybal2|4 years ago|reply

> Where are Google and Azure with ARM instances?

I think they are bound by long term supply agreements with Intel. They will just bargain for better prices with Intel.

Not an easy task it will be, given that Intel is capacity jammed.

[+] MangoCoffee|4 years ago|reply

https://interestingengineering.com/microsoft-will-build-adva...

"The second phase will see these entities develop custom integrated chips and System on a Chip (SoC) with "lower power consumption, improved performance, reduced physical size, and improved reliability for application in DoD systems."

maybe its coming?

[+] lostmsu|4 years ago|reply

Funnily, IBM offers ARM instances. Even on free tier.

[+] arm_throwaway|4 years ago|reply

[deleted]

[+] talawahtech|4 years ago|reply

I was wondering how they were going to manage the fact that AMDs Zen3 based instances would likely be faster than Graviton2. Color me impressed. AWS' pace of innovation is blistering.

[+] amelius|4 years ago|reply

I really hate it that big companies are rolling their own CPU now. Soon, you're not a serious developer if you don't have your own CPU. And everybody is stuck in some walled garden.

I mean, it's great that the threshold to produce ICs is now lower, but is this really the way forward? Shouldn't we have separate CPU companies, so that everybody can benefit from progress, not only the mega corporations?

[+] ksec|4 years ago|reply

>I really hate it that big companies are rolling their own CPU now. Soon, you're not a serious developer if you don't have your own CPU. And everybody is stuck in some walled garden.

It is still just ARM. You can buy ARM chip everywhere. There is no walled garden.

> Shouldn't we have separate CPU companies, so that everybody can benefit from progress,

You are benefiting the same CPU design from ARM, and same Fab improvement from TSMC. Amortised across the whole industry. Doesn't get any better than that.

[+] minedwiz|4 years ago|reply

I mean, it's just ARM - pretty standard architecture these days. If the big companies want to compete on chip design, I don't see it as all that different from AMD, Intel (and Via if you count them) competing on x86-compatibles.

[+] lizthegrey|4 years ago|reply

You can get Ampere chips which are not proprietary to any specific cloud provider. Both Packet/Equinix Metal and Oracle use them.

[+] zokier|4 years ago|reply

I do point out that vendor-specific architectures were the norm for long time in the history and the sky didn't fall down. The x86 dominance was relatively short anomaly more than anything.

[+] Terry_Roll|4 years ago|reply

Microcode is your friend. https://hackaday.com/2017/12/28/34c3-hacking-into-a-cpus-mic...

Blobs can also be reverse engineered.

[+] pjmlp|4 years ago|reply

Lets not pretend using a Z80, 6502 or 68000 made the remaining hardware differences go away.

[+] unknown|4 years ago|reply

[deleted]

[+] haukem|4 years ago|reply

Does Graviton3 use the Neoverse V1? The Graviton2 used the Neoverse N1.

The features listed here match the core: https://developer.arm.com/ip-products/processors/neoverse/ne...

The N2 misses the bfloat, but it could be that the ARM marketing named it differently: https://developer.arm.com/ip-products/processors/neoverse/ne...

[+] dragontamer|4 years ago|reply

N2 has BFloat16 instructions. EDIT: I probably should have a citation: https://developer.arm.com/documentation/PJDOC-466751330-1825...

Page 50 of 92 shows off BFCVTN, BFDOT, BFMMLA (matrix multiply and accumulate), BFCVT, and other BF16 instructions on the N2.

I'd assume this Graviton 3 is a N2 core. But that's just me assuming.

[+] Alex3917|4 years ago|reply

How long until we get T5g with Graviton3?

[+] croddin|4 years ago|reply

Graviton2 was announced at re:invent 2019 and t4g came out in September 2020 so my guess is we will see t5g instances by September 2022.

[+] shaicoleman|4 years ago|reply

They don't always update the T series instances for every generation, so I wouldn't hold my breath.

[+] kloch|4 years ago|reply

DDR5? Can't wait to see how the memory bandwidth performs. I have a project that is memory bandwidth limited on c6gd instances.

[+] lostmsu|4 years ago|reply

Any benchmarks? I'd like to see Geekbench 5 results from a full-sized one socket instance.

[+] lizthegrey|4 years ago|reply

No benchmarks yet, but I can get 30% faster latency, on 2/3 the number of instances compared to c6g for the same uncompress/compress/write to Kafka workload.

[+] givemeethekeys|4 years ago|reply

Side-note: Polly's intonation makes for a fairly confusing listening session.

[+] sabujp|4 years ago|reply

yeah signed stack pointers is a thing since at least 2017 : https://lwn.net/Articles/718888/

[+] sydthrowaway|4 years ago|reply

aka arm64e

[+] 2-718-281-828|4 years ago|reply

are ARM (Advanced RISC Machine) and Arm the same?

[+] unknown|4 years ago|reply

[deleted]

[+] count|4 years ago|reply

Yes.

[+] aclelland|4 years ago|reply

So that's the c6g and c7g but x86 instances are still on c5. Will AWS ever release an x86 computing instance again or is this just a sign that x86 has reached peak performance on AWS?

[+] pella|4 years ago|reply

(29 NOV 2021) "New – Amazon EC2 M6a Instances Powered By 3rd Gen AMD EPYC Processors"

https://aws.amazon.com/blogs/aws/new-amazon-ec2-m6a-instance...

"Up to 35 percent higher price performance per vCPU versus comparable M5a instances, up to 50 Gbps of networking speed, and up to 40 Gbps bandwidth of Amazon EBS, more than twice that of M5a instances."

"Larger instance size with 48xlarge with up to 192 vCPUs and 768 GiB of memory, enabling you to consolidate more workloads on a single instance. M6a also offers Elastic Fabric Adapter (EFA) support for workloads that benefit from lower network latency and highly scalable inter-node communication, such as HPC and video processing."

"Always-on memory encryption and support for new AVX2 instructions for accelerating encryption and decryption algorithms"

[+] messe|4 years ago|reply

They did. A month ago.

https://aws.amazon.com/about-aws/whats-new/2021/10/amazon-ec...

[+] jfbaro|4 years ago|reply

This seems to be a good fit Lambda as well. Looking forward to seeing more information about the CHIP design and capabilities.

110 comments