Cloud cost optimisation is underrated. In the companies I've worked in nobody has really given a shit (at least not under normal economic circumstances). In the industry there's a strong avoidance of ARM compute instances for no good reason. If I were building from scratch today I would definitely go with Graviton.
At $dayjob I found an unused box in the cloud running an expensive database engine. It was idle for months, created to be used by a consultant on a project that had wound up. The consultant had quit his consultancy on top of this.
I was told under no uncertain terms not to even think of touching this VM because “the budget has been approved”.
I was shocked at the flagrant waste of money and assumed it was a one-off aberration.
Nope, for months afterwards I kept hearing the same refrain from manager after manager, from product owners and dev team leads.
“Don’t touch! We fought hard for this budget! You’ll take it from our cold dead hands!”
Eventually I soured on the whole idea of cloud cost optimisation a service for unmotivated third parties and gave up on the whole notion.
I think the main reason is "I want to run the same binaries locally that I run in the cloud," and it's a pretty valid one. However, it's also an expensive one sometimes.
Completely agree... the only exception I've run into is that for small operations build tooling often doesn't work well with arm64.
EG: GitHub actions can build a container in a few minutes in x64 or 35 minutes in arm64... likewise aws-cdk literally could not run an arm64 fargate ecs deployment for months after support was added (They simply did not support the required attribute in the container definition).
I would love to see this change as I've had nothing but great experiences with graviton for virtually anything arm supported.
First graviton is not magic. We switched our main service, which is a nodejs monolith, and did not get any cost improvement (we had to add more instances to handle the same workload, which ended up being equivalent cost wise). There are certainly use cases when it's better, but it doesn't seem to be the only and obvious choice for all use cases.
Second our laptops and our CI are amd64 machines, and being able to run the same docker images in prod and locally is nice, and not having to build the image with qemu on the CI is also good.
I don't mind cloud-ARM, but there definitely are good reasons not to use it (which of course don't apply to everyone)
Switching to Gravitron isn't an automatic cost savings. Everything is optimized for x86. It maybe cheaper, but significantly slower. We've been trying to migrate for the last year for both cost saving but also we switched to ARM based laptops.
This. We have three people entirely dedicated to reducing costs.
As for avoiding ARM, we do only x86-64 because corporate security policy demands that we have Windows laptops so that some box ticking overlord can fill out a security policy compliance form. That means we're stuck limping along with docker and WSL2. Every single engineer in the org has an arm64 machine at home already and wants a proper computer at work, which can ironically work in the same policy framework if anyone gave enough of a shit to deal with it.
So that's why we don't use Graviton; corporate security policies. Our customers will just have to eat the price hikes.
Agree, I've gone to Graviton instances by default for RDS and ElastiCache (run and own a DevOps consulting company). The big problem that I continue to deal with is native arm64 Docker containers (if you a cool kid running containers / Kubernetes). For example, the very popular Bitnami charts don't support arm builds even though the community has been screaming for support.
> In the industry there's a strong avoidance of ARM compute instances for no good reason.
Not no reason; it adds work and risks incompatibility. Now, that work might be relatively small, and most software these days is compatible with aarch64, but compared to amd64 (which is the de-facto standard, already supported by everything, the default without needing to set anything up) it's still something, and businesses are risk-averse.
The biggest downside I've found with Graviton is that it's gotten popular enough that availability of capacity is a problem in some regions/AZs - particularly if you're using larger EC2 instance types.
Also, Fargate Spot on Graviton is still not available, so if you want to run Spot in non-production environments, you're facing with running different architectures in prod vs. non-prod, which I don't like at all. Do the math on whether it's cheaper for your use case to go x86 spot/non-spot vs. Graviton non-spot.
I found graviton to be a mixed bag. It was certainly extremely fast when using the very high end instances and I tested it successfully using a Rust based message queue system I was writing and it got some ridiculously fast number like 8 million messages a second, from memory, using the fastest possible graviton instance (this was about 18 months ago).
I did try to switch some of my database servers to it a couple of years ago and after random hangs, I gave up and went back to intel. I tried again further down the track and same thing - random hangs. I assume this sort of thing comes with a new architecture but I'd be hesitant to move any production infrastructure to it without extensive long term testing.
In the case of graviton based GPU instances I found that the GPU enabled software I wanted to use didn't work.
If you are comparing performance, I'd suggest buying a fast AMD machine and run it locally and compare performance - local servers tend to be much faster and cheaper than cloud. And if your application uses GPUs then if you possibly can then its very much in your interests to run local servers.
Arm has a much looser memory model than x86 [1 for a comparison]. It's possible that the random hangs are due to a race condition in PG that doesn't show up in x86 because memory visibility doesn't require as much synchronization.
There are huge differences in the machine generations. We found that for our workload Graviton3 (c7g) is the best, followed by AMD (m6a), followed by Intel (m6i) with Graviton2 (m6g) somewhat lagging. We can't use Graviton3 however because of memory limitations, so we're using AMD. The difference to the old machine types (m5) is staggering, the m6a is basically twice the performance of m5, while being cheaper.
However, I've seen a lot of benchmarks telling a different story, so it is important to actually measure your workloads.
I’ve been enjoying them here and there but I’ve also found that for some of my workloads a high clock Intel node is required. Even the Epyc nodes couldn’t keep up. I don’t completely know why, never dug too far into it.
> local servers tend to be much faster and cheaper than cloud.
Of course, running a server in your house is not going to achieve five or even three 9's of reliability, and even colocating a single rack in a single location might be more expensive than putting that infra in AWS (depending on how data-heavy your use case is, given AWS' exorbitant data transfer costs).
I'm interested in hearing more about their switching to Graviton with Clickhouse.
We've been testing Clickhouse on Graviton and the performance isn't there due to a variety of reasons, most notably it seems because Clickhouse for arm64 is cross-complied and JIT isn't enabled like it is for amd64[1].
For AWS managed resources definitely use Graviton. But for spot instances in EC2 we've found better pricing and greater availability by staying on x86. (We run 100% of our web services and background workers on spot instances).
Same experience here. I cut over a whole bunch of instances to Graviton a while back and it "just worked" for a lot of our workloads. Test it first, obv.
Another easy cost-savings switch is telling Terraform to create EC2 root volumes using gp3 rather than gp2 (gp3 is ~10% cheaper and theoretically more performant). The AWS API and Terraform both still default to gp2 (last I checked), and I wonder how many places are paying a premium for it.
AWS Graviton is interesting because it is pretty different machine to their AMD and Intel offerings. A "16 vCPU" machine from AWS is 8 cores/16 threads, not 16 cores - except for Graviton, which actually has 16 cores, although much weaker ones. So for problems where you the cores can actually work in parallel, Graviton can keep up with AMD and Intel, while being somewhat cheaper. In single-threaded workload you get about half the performance.
Second thing I curious about is this very AWS heavy approach. ECS, CodeDeploy, ElastiCache. If I was their architect, I would probably go EKS, GitHub/Lab, Redis on EKS, just for the peace of mind.
ECS is so much simpler to use and understand than Kubernetes, even on EKS.
But as for CodeDeploy... IMHO the only reason to use it is "I don't want to deal with another vendor" due to procurement/compliance hell in large companies.
We have a very similar story at my org. We run around 100 RDS aurora clusters and switched to graviton. I'm surprised to see 35% gains here, we saw more like 10-15%. But since amazon natively supports mysql on aurora we didn't have to worry about compatibility. Our main highlight was the way we wrote our infra as code where we made switching instances types or service we use fairly simple task, so we have switched instance types a couple of times in past and could easily make dev use t3s.
Getting on cloud is a trap and not the usual we deploy on the servers and we live situation. Give weight to write some good code to manage your infra and able to adopt optimizations as they occur. It will ramp up in expense soon otherwise.
I'm not sure I understand the point of this article : in theory they don't depend on x86 only code, so they've switched to arm and it worked, as expected, and things are cheaper.
I'm happy that they've shrunk their bill, but I somehow expected some kind of 'unfortunately, things went wrong because of bizarre memory model issues causing difficult concurrency bugs'.
You can get those bugs when you are doing your own atomics, and your code relies on x86's relaxed memory semantics. It looks like their code is JS and Go, which buries that stuff. Services they use were already proven out on ARM. (Or, maybe, are not on ARM?)
Relaxed memory bus semantics imposes a pretty substantial performance cost. Depending on how they are billed, this might account for a big chunk of their lower cost. But probably not.
Their real problem is that they are firmly entrenched in proprietary Amazon services, so switching to another cloud would be very difficult. Amazon can raise prices 35% anytime, and what can they do?
Not saying what you're switching from makes the title a bit click-baity.
The current title implies that AWS (Graviton or not) is somehow cheaper than other things, when AWS is quite often one of the priciest options out there.
"Switching from AWS to AWS Graviton slashed our infrastructure bill", on the other hand, is an article worth examining.
I have only ever used AWS, what is the go to cloud provider these days? I know GCP and Azure are catching up, but are people just going back to renting some boxes in a data center and just hosting their stuff on there?
I upvoted to compensate for the downvotes, but I really curious why you think its a match made in heaven vs just using a random x86 laptop and an x86 cloud instance?
My take, is that you want to cheer Arm on in this space, not because there is some huge technical advantage, or the arch meets your fancy or whatever, but because it is adding another competitor to the space. One that brings its own baggage, but having three+ competitors competing to be the best (intel/amd/various arm vendors) is a good thing for the industry.
But the blank fanboyism is just harmful, the mid 2010's with all the Intel fanboys talking up Intel, while they screw everyone with low clocked server processors limited to 2 cores copy pasted in their laptops when you could buy a freeking phone with 8 cores, is what you get when one company gets to much market share or too far ahead of everyone else. The same thing is going to happen if gravaton becomes the dominate platform, except its going to be a case that you won't be able to buy competitive onprem hardware, or any number of other shortcomings. Or like the Mac a piece of hardware which isn't technically locked down, but also will likely always have subpar support running any operating system not shipped by apple, and could be locked down tomorrow without affecting their business one bit.
So, careful what you wish for. You want competitors that show the giant monopoly that maybe designing a processor for an actual laptop is advantageous over shoveling whatever leftovers from the hyperscalers happen to exist. You also want competitors that show up and pack 2x the cores at 1/2 the price. Or competitors that show up with huge power hungry processors that are pushing the limits of single threaded, high Ghz processors, or 500W GPUs because that is what some people need/want.
It is unless you're the first poor soul to embark on the journey with lots of x86_64 buildup. Having said that, it's been fun though so far. Managed to migrate our dev local k8s toolchain and been using buildx to make multiarch images and manifests for our internal stuff.
Since your core sales feature is "privacy friendly" which will surely be appreciated in the EU, it might make sense to offer local hosting or self-hosting.
> I believe using Amazon AWS already disqualifies you from being fully GDPR-compliant.
AFRIK - There is nowhere in GDPR that says your data ought to reside in EU server per GDPR.
However, if I understand the Shopify legality complaint it's saying "because your data is hosted by a US entity and theoretically could be accessed by the US authorities it means the US authorities are now part of the data custody and you can't guarantee that they also have that data". That's a legal grey area with a lot of political ramifications.
c7DJTLrn|3 years ago
jiggawatts|3 years ago
I was told under no uncertain terms not to even think of touching this VM because “the budget has been approved”.
I was shocked at the flagrant waste of money and assumed it was a one-off aberration.
Nope, for months afterwards I kept hearing the same refrain from manager after manager, from product owners and dev team leads.
“Don’t touch! We fought hard for this budget! You’ll take it from our cold dead hands!”
Eventually I soured on the whole idea of cloud cost optimisation a service for unmotivated third parties and gave up on the whole notion.
pclmulqdq|3 years ago
arecurrence|3 years ago
EG: GitHub actions can build a container in a few minutes in x64 or 35 minutes in arm64... likewise aws-cdk literally could not run an arm64 fargate ecs deployment for months after support was added (They simply did not support the required attribute in the container definition).
I would love to see this change as I've had nothing but great experiences with graviton for virtually anything arm supported.
forty|3 years ago
Second our laptops and our CI are amd64 machines, and being able to run the same docker images in prod and locally is nice, and not having to build the image with qemu on the CI is also good.
I don't mind cloud-ARM, but there definitely are good reasons not to use it (which of course don't apply to everyone)
andrewstuart|3 years ago
adrr|3 years ago
gryf|3 years ago
As for avoiding ARM, we do only x86-64 because corporate security policy demands that we have Windows laptops so that some box ticking overlord can fill out a security policy compliance form. That means we're stuck limping along with docker and WSL2. Every single engineer in the org has an arm64 machine at home already and wants a proper computer at work, which can ironically work in the same policy framework if anyone gave enough of a shit to deal with it.
So that's why we don't use Graviton; corporate security policies. Our customers will just have to eat the price hikes.
nodesocket|3 years ago
hdjjhhvvhga|3 years ago
tester756|3 years ago
yjftsjthsd-h|3 years ago
Not no reason; it adds work and risks incompatibility. Now, that work might be relatively small, and most software these days is compatible with aarch64, but compared to amd64 (which is the de-facto standard, already supported by everything, the default without needing to set anything up) it's still something, and businesses are risk-averse.
rjh29|3 years ago
mk89|3 years ago
Thaxll|3 years ago
beaviskhan|3 years ago
Also, Fargate Spot on Graviton is still not available, so if you want to run Spot in non-production environments, you're facing with running different architectures in prod vs. non-prod, which I don't like at all. Do the math on whether it's cheaper for your use case to go x86 spot/non-spot vs. Graviton non-spot.
andrewstuart|3 years ago
I did try to switch some of my database servers to it a couple of years ago and after random hangs, I gave up and went back to intel. I tried again further down the track and same thing - random hangs. I assume this sort of thing comes with a new architecture but I'd be hesitant to move any production infrastructure to it without extensive long term testing.
In the case of graviton based GPU instances I found that the GPU enabled software I wanted to use didn't work.
If you are comparing performance, I'd suggest buying a fast AMD machine and run it locally and compare performance - local servers tend to be much faster and cheaper than cloud. And if your application uses GPUs then if you possibly can then its very much in your interests to run local servers.
axiak|3 years ago
1: https://www.nickwilcox.com/blog/arm_vs_x86_memory_model/
glogla|3 years ago
However, I've seen a lot of benchmarks telling a different story, so it is important to actually measure your workloads.
no_wizard|3 years ago
GCP, Azure, Supabase, Cloudflare etc if you want managed services.
If you want a mix of managed services and raw compute, look more at Fly.io, Linode, Digital Ocean perhaps?
I have found AWS being the "cheapest" or even "reasonable" in the cost department to be slimmer every year.
whalesalad|3 years ago
Dowwie|3 years ago
judge2020|3 years ago
Of course, running a server in your house is not going to achieve five or even three 9's of reliability, and even colocating a single rack in a single location might be more expensive than putting that infra in AWS (depending on how data-heavy your use case is, given AWS' exorbitant data transfer costs).
neodypsis|3 years ago
ecliptik|3 years ago
We've been testing Clickhouse on Graviton and the performance isn't there due to a variety of reasons, most notably it seems because Clickhouse for arm64 is cross-complied and JIT isn't enabled like it is for amd64[1].
1. https://fosstodon.org/@manish/109397948927679076
unknown|3 years ago
[deleted]
phamilton|3 years ago
kodyo|3 years ago
Another easy cost-savings switch is telling Terraform to create EC2 root volumes using gp3 rather than gp2 (gp3 is ~10% cheaper and theoretically more performant). The AWS API and Terraform both still default to gp2 (last I checked), and I wonder how many places are paying a premium for it.
glogla|3 years ago
AWS Graviton is interesting because it is pretty different machine to their AMD and Intel offerings. A "16 vCPU" machine from AWS is 8 cores/16 threads, not 16 cores - except for Graviton, which actually has 16 cores, although much weaker ones. So for problems where you the cores can actually work in parallel, Graviton can keep up with AMD and Intel, while being somewhat cheaper. In single-threaded workload you get about half the performance.
Second thing I curious about is this very AWS heavy approach. ECS, CodeDeploy, ElastiCache. If I was their architect, I would probably go EKS, GitHub/Lab, Redis on EKS, just for the peace of mind.
dserodio|3 years ago
But as for CodeDeploy... IMHO the only reason to use it is "I don't want to deal with another vendor" due to procurement/compliance hell in large companies.
fuzzyengineer|3 years ago
Agingcoder|3 years ago
I'm happy that they've shrunk their bill, but I somehow expected some kind of 'unfortunately, things went wrong because of bizarre memory model issues causing difficult concurrency bugs'.
What am I missing?
moloch-hai|3 years ago
Relaxed memory bus semantics imposes a pretty substantial performance cost. Depending on how they are billed, this might account for a big chunk of their lower cost. But probably not.
Their real problem is that they are firmly entrenched in proprietary Amazon services, so switching to another cloud would be very difficult. Amazon can raise prices 35% anytime, and what can they do?
MuffinFlavored|3 years ago
What's Intel's response to this as a company? I know that isn't mentioned in the article but... just curious
Does Intel have any ARM offering whatsoever?
Does AMD have any ARM offering?
johnklos|3 years ago
The current title implies that AWS (Graviton or not) is somehow cheaper than other things, when AWS is quite often one of the priciest options out there.
"Switching from AWS to AWS Graviton slashed our infrastructure bill", on the other hand, is an article worth examining.
matt-p|3 years ago
andrewxdiamond|3 years ago
makestuff|3 years ago
kaustubhvp|3 years ago
lemonJS|3 years ago
Terretta|3 years ago
Everyone please do this so we collectively fix all the things to work with this. :-)
StillBored|3 years ago
My take, is that you want to cheer Arm on in this space, not because there is some huge technical advantage, or the arch meets your fancy or whatever, but because it is adding another competitor to the space. One that brings its own baggage, but having three+ competitors competing to be the best (intel/amd/various arm vendors) is a good thing for the industry.
But the blank fanboyism is just harmful, the mid 2010's with all the Intel fanboys talking up Intel, while they screw everyone with low clocked server processors limited to 2 cores copy pasted in their laptops when you could buy a freeking phone with 8 cores, is what you get when one company gets to much market share or too far ahead of everyone else. The same thing is going to happen if gravaton becomes the dominate platform, except its going to be a case that you won't be able to buy competitive onprem hardware, or any number of other shortcomings. Or like the Mac a piece of hardware which isn't technically locked down, but also will likely always have subpar support running any operating system not shipped by apple, and could be locked down tomorrow without affecting their business one bit.
So, careful what you wish for. You want competitors that show the giant monopoly that maybe designing a processor for an actual laptop is advantageous over shoveling whatever leftovers from the hyperscalers happen to exist. You also want competitors that show up and pack 2x the cores at 1/2 the price. Or competitors that show up with huge power hungry processors that are pushing the limits of single threaded, high Ghz processors, or 500W GPUs because that is what some people need/want.
_joel|3 years ago
fxtentacle|3 years ago
Since your core sales feature is "privacy friendly" which will surely be appreciated in the EU, it might make sense to offer local hosting or self-hosting.
mbesto|3 years ago
AFRIK - There is nowhere in GDPR that says your data ought to reside in EU server per GDPR.
However, if I understand the Shopify legality complaint it's saying "because your data is hosted by a US entity and theoretically could be accessed by the US authorities it means the US authorities are now part of the data custody and you can't guarantee that they also have that data". That's a legal grey area with a lot of political ramifications.
According to Shopify this doesn't make it illegal: https://www.shopify.com/de/blog/shopify-dsgvo-konform-deutsc...