top | item 23666999

How we got our AWS bill to around 2% of revenue

298 points| grwthckrmstr | 5 years ago |sankalpjonna.com | reply

232 comments

order
[+] PaulKeeble|5 years ago|reply
If you end up in the lightsail world and you are not utilising Amazon's other services then its probably cheaper to do this with another provider. Someone like Contabo or Hetzner will get you VMs at substantially less cost with similar fixed hardware and just give you the box but for a lot less a month. At these low scales with a completely open source stack Amazon isn't good value in my opinion. It is as you grow and it provides the scale that it becomes valuable.
[+] jbverschoor|5 years ago|reply
AWS initially was never meant for this type of workload.

It started with epiphemeral instances, available for a fixed workload, billed by the hour, such as batch jobs.

It turned out people were willing to pay $100 a month for a box worth $30 a month.

[+] amq|5 years ago|reply
Or DigitalOcean, which also offers many of the big-cloud perks like managed db, object storage, load balancers and kubernetes.
[+] root993|5 years ago|reply
Hello, author here.

We definitely considered using other cloud providers like DO, linode, etc. But it was important for us to go with AWS because we needed some of the other services that AWS providers like s3, Route53, etc.

Some of our static websites are in fact hosted entirely using CloudFront + s3 combination which is something I forgot to mention in this post :)

[+] tyingq|5 years ago|reply
Also, Lightsail is hot garbage. It has the same crazy aggressive throttles as a T2 instance. You'll have roughly 5% of the original CPU if you run hot for half an hour or so. It is not comparable at all to other cheap VPS options.
[+] csdreamer7|5 years ago|reply
Just looked at contabo.com

Am I reading this right? 4 cores, 8 gigs of ram, 200 gigs SSD for 4.99 euro a month? That is around $5.60 USD a month. That feels too cheap to be true. What's the catch over ec2 or Digital Ocean?

[+] davidgerard|5 years ago|reply
yeah, that's literally 4x the cost of my box at Hetzner

mind you, it's very much a pet and not cattle

[+] robertlagrant|5 years ago|reply
This is interesting. Do you have (or know of) a published comparison somewhere?
[+] LogicX|5 years ago|reply
Just to be clear, this would be more appropriately titled How we saved money moving from AWS to VPS providers.

It comes as little surprise to me that the AWS lightsail offering saved them money over traditional AWS services, just as they’d save even more using two dedicated servers with any reasonable provider and being able to have a hot-failover for everything.

At DNSFilter we’ve gone through the evolution of Heroku -> AWS -> VPSs -> Dedicated -> and now are setting up our first colo rack.

I did a price comparison recently between Dedicated, AWS, and Colo. Dedicated is 15% the cost of AWS for our needs, and colo will be 42% the cost of Dedicated for us.

Now keep in mind we have dedicated DevOps staff and are running at a very different scale from OP, and such a solution is not for everyone. But I personally have never understood the folks who love to brag about having spent much time and effort optimizing AWS to spin up and spin things down, when for the same cost I could just have a 10x more powerful server sitting there, on all the time to handle a spike in load, and I can utilize the extra resources or get things done faster with faster gear at 1/10th the price.

[+] ies7|5 years ago|reply
In the end, the most optimized way to save AWS cost is to call the AWS sales guy for lunch.

I don't ask for anything but usually 2-3 days after that he'd call to inform that I got a few thousands of AWS credit.

[+] r1ch|5 years ago|reply
100% agreed. I always cringe when I see some hot new startup going all out on AWS or other cloud services when a $60/mo dedicated box would have more than covered their needs. No point spending that money until they actually need to worry about scaling, and even then, the value of dedicated servers / colo is so much better if you have the staff to support it. I manage a top 2k site off of some OVH boxes and our bandwidth alone at any of the major cloud providers would be many multiples of our monthly bill.
[+] mwcampbell|5 years ago|reply
> two dedicated servers with any reasonable provider and being able to have a hot-failover for everything.

Except you have to implement that hot failover yourself, rather than using something like EC2 auto-scaling groups or an RDS multi-AZ deployment. I'm not confident that I'd get it right, and the middle of a night is a hell of a time to discover that you got it wrong.

[+] rumanator|5 years ago|reply
> An EC2 instance with 2 virtual cores, 4GB RAM and a storage of 80GB costs roughly 37$ a month and a Lightsail instance with the exact same configuration costs 20$ a month which is almost half the cost!

The author completely failed to do his homework. An a1.large instance is only 37$ if it's an on-demand instance. You pay a high premium to be able to pull the switch in the exact minute you cease to need one.

If he's willing to go with AWS lightsail with it's monthly plan, the same a1.medium instance type is about 11.75$/month as a reserved instance, and can be had for about 131$/year as well.

[+] kijin|5 years ago|reply
The Lightsail instance comes with 4TB/mo of free transfer. Even using a small fraction of that transfer on EC2 will cost you more than the instance itself.

Also, creating an instance on Lightsail isn't exactly a monthly commitment. You are free to delete your instance at any time and only pay for the hours you used. It's a lot more flexible than the multi-year commitment you need to make in order to get the best pricing out of EC2.

[+] petrbela|5 years ago|reply
There are similarly magic numbers in the database cost. A 2 vCPU/4GB instance is db.t2.medium which comes to about $35/month on demand, plus 120 GB is addt'l ~$13 so it actually comes out cheaper than the lightsail version, and certainly way below the proclaimed $200.
[+] karmakaze|5 years ago|reply
This is one place where Google's cloud platform is better in that it automatically applies continuous use discounts without the guessing games.
[+] callmeal|5 years ago|reply
In addition to data transfer costs, the Lightsail instance also comes with storage, which will cost you extra on the AWS one.
[+] n_u_l_l|5 years ago|reply
Not sure if relevant, but A1 is ARM, so not exactly the same.
[+] ramraj07|5 years ago|reply
I dabble extensively in being dirt cheap with my monthly cloud spend on personal projects, and after much experimentation, I have settled on the following:

1. elastic beanstalk, no docker: EB comes with really nice defaults so that you can quickly whip up a flask app, upload it and it just works. It provisions a small ec2 instance by default which iirc costs 10ish a month at best. Importantly, any operation you do with EB will by definition be ready for continuous deployment since you don't get an option to ssh into the machine to deploy. It's extensible enough to add whatever extra stuff you need as well. Only thing it can't do is simple caching ( if you scale to more than one instance that is), but that can be solved by having a separate eb deployment for a worker that can take care of all these aux stuff (elasticache is expensive I think). In a pinch, it also scales well (though the default cheap deployment does not have a load balancer).

2. Just suck it up and go with RDS postgres. Again, I see too many things going wrong with spinning up your own db in an ec2 instance especially if that instance goes down. Im too lazy to write backup scripts and keep track of them! The cheapest RDS postgres costs 13 a month or so, but I just suck it up to power whatever side projects I do. Postgres means I'm working with something I know, and I get full text search and pubsub for free. And whatever I write is not locked code in anyform, and can be scaled up if needed as well. More importantly I'm only spending my time in technologies that are relevant for me in my day job so that's a win.

3. Github Actions to deploy to eb. just a few lines of yaml and you instantly get continuous deployment directly from your repo for free! Really can't beat that.

I have meant to try out heroku since it could be cheaper from what I have read. But I couldn't figure out what their S3 alternative is or how different they are from the canonical cloud offerings.

I'm sure it can be so much more cheaper, but I'm not good at advanced networking or sysadmin, and I'm too lazy / bored / disorganised to write deploy scripts or sshing into remote machines. I'm also always afraid of if/how long I need to re-provision a vps/ec2 if it goes down. Not that they do, but they can, and that scares me.

[+] gnaman|5 years ago|reply
Have you come across any cheap elasticache alternatives? I'm on AWS free tier and AWS gives you a free 2 vCPU 0.5gb(or 1 vCPU and slightly more memory) instance which suits me for now. But I was wondering if there are any other managed redis alternatives? We are just 2 people and we want to keep our stack open to be able to switch providers and I don't know how easy it'd be to switch db instances.

I have explored redislabs but all in all it seems much more expensive than elasticache. A similar instance from redislabs costs ~$36 vs ~$13 in AWS. My comparison was based solely on capacity

[+] root993|5 years ago|reply
Hello, author here.

1. When I started working on this I was not fully aware of the large array of services that AWS provides and therefore our setup may not be the best possible one. So I would be checking out Elastic beanstalk for sure and see if it is feasible to use for my next venture.

2. Just to clarify, we are not running mysql on a lightsail instance, rather we are using a managed database provided by lightsail which automatically takes backups on its own and also has the option to restore a new DB from an existing backup.

3. Thanks for the insight on EB. Will be looking into it for sure.

The thought of any of these instances going down scares me as well. but I would like to believe that I have set up enough alerts everywhere so I can take immediate action :)

[+] PetahNZ|5 years ago|reply
This is what we do also, but at a slightly higher level. EB web, and worker environments, t3a.small instances, auto scaling between 10-100 instances. And using Aurora serverless RDS.
[+] elondaits|5 years ago|reply
One problem of VPS solutions is that they're easy to maintain until you have to do it. At some point the technical debt comes back and you need to upgrade the stack components and eventually the OS as well, without sacrificing uptime.

... if you did things well, it's just starting another instance and installing / copying things over with some minimum downtime. But if you don't have a 100% documented stack, you don't know which configurations you touched to make things work, you have files lying around, then you're probably going to pay back all your savings and more in the workdays needed to migrate.

At my current employer we never have time for maintenance, and we don't have professional expertise (the "I worked exclusively at this for many years" kind, I mean) at webhosting, security, system admin or dba, so I heavily lean towards more "managed" and cloud solutions.

[+] disgruntledphd2|5 years ago|reply
I completely agree (it's always the small changes that make things work that cause problems), but for this product which is greenfield and has a very small team, it seems like a good fit.

Like, I wouldn't run Shopify like this, or anything really large, but it may get them to a better place right now.

What I really liked about this post was the discussion of what they actually needed and how to achieve their goals. It's a tradeoff of time against money, and I wish more people documented their thought processes and approaches to this kind of stuff.

[+] amelius|5 years ago|reply
But cloud solutions can change their UIs and APIs anytime they want. They can also drop features as they please.
[+] pachico|5 years ago|reply
I read a lot of "you are wrong", "you didn't think about this", etc, which I'm not going to get into. I embrace these posts as an invitation to re-evaluate, with your own data and use cases, your technical decisions and for that I'm always grateful.

On the same note, in case someone is digging into how to reduce CDN bills, I wanted to share that we are quite happy with BelugaCDN. It distributes objects stored in S3 using, in a hacky way, referrals as authentication method. Lots of money saved there.

[+] lend000|5 years ago|reply
A bit off topic, but I bet someone here knows. When running an EC2 instance and not using all of the cores on the socket (for example, using a c5.large instance instead of a c5.12x large, which gives you all the cores on the socket), you presumably are sharing your L3 cache with your neighbors on the same socket, because that's how the processor is designed.

Is there a way that the hypervisor allocates a dedicated portion of the shared L3 cache to just your instance, or is it a free for all for all of the L3 cache space against potentially noisy neighbors?

[+] nisten|5 years ago|reply
Yeah that is true, you are sharing L3 cache. In order to mitigate some of recent intel issues I think AWS actuallly has their own chip now on newer motherboards to handle the hypervisor duties securely.

Otherwise, they'd do it in software patches for older CPUs and take the performance hit of the patch.

I'm not sure how much the hypervisor would reserve off of the L3, it is likely to be free for all however you'd still have quite a bit of dedicated L2 and L1 on most xenons. With AMD's first gen EPYC it's a little bit different because clusters of cores share a cache and you can get weirdly high latencies depending on which cores you're using, (i.e. cores 8 and 9 being too far apart)

Also according to this anandtech article, the average total CPU load for physical aws machines is ~60% and is actively balanced out by them. And yes, running benchmarks on a machine without noisy neighbors yields very significant improvements, up to 2x better on the benchmark scores. They measured this by comparing renting out all the cores of a machine vs only renting out the 4 or however they needed .

https://www.anandtech.com/print/15578/cloud-clash-amazon-gra...

I'm assuming they'd put a CPU into sleep/hibernate mode in order to save power instead of having it only run at 5% utilization.

[+] Keyframe|5 years ago|reply
Sounds like a textbook example of where and how a side channel attack would look like.
[+] KenCochrane|5 years ago|reply
Maybe I missed something, but how do you handle the fact that your nginx server is a single point of failure? If that goes down, traffic can’t get to your web servers.

Do you have more than one, and DNS load balance, or do you just live with the risk?

One of the main reasons why I use an ALB/ELB is so that I don’t have that SPOF. If you found a way around that, please share, I would love to know, so I can save some money :)

[+] n_u_l_l|5 years ago|reply
His database also is.

I think it's highly unprofessional to use a setup like this in production. Looking at his product, it seems like a product whose downtime has a big impact on their clients.

[+] root993|5 years ago|reply
I hate to admit it but the nginx server infact is a SPOF.

We currently handle it by setting up alerts all over the place so I can take quick action if something goes sideways, but other than this I have not really found a way around it.

We also have latest snapshots ready of all our instances so that I can get another server running ASAP during a calamity.

[+] mattbillenstein|5 years ago|reply
Their setup is trivial - they could do it at 0.2% of revenue with a cheap vps on Linode or other...
[+] nordsieck|5 years ago|reply
> Their setup is trivial - they could do it at 0.2% of revenue with a cheap vps on Linode or other...

Lightsail has pretty comparable pricing to Linode. I'm sure they could re-architect their app to use fewer instances, but they could do that and stay on Lightsail as well.

Moving to linode isn't going to give them a 90% savings.

[+] root993|5 years ago|reply
Hello, Author here.

While this is true, we wanted to go with AWS for 2 primary reasons

1. We use some of the other services provided by AWS like s3 and Route53.

2. Just the reliability and brand that AWS has is something that we had to take into consideration

[+] elcomet|5 years ago|reply
It's the difference between pets and cattle.

They are using Lightsail to create pets, it has not much configuration. When you need a cattle, you need more config and complex setup which costs money.

As they say in the end, this is because they are a micro startup and don't need a huge scalable infrastructure.

It's funny as it seems to be the inverse of economy of scale. The more you grow, the higher the marginal cost. But I think that Amazon gives discount to large users

[+] scarface74|5 years ago|reply
For context, I know the ins ands outs of most of the core AWS services really well from the dev, Devops, and ops side.

But, my advice tends to be Lambda first if it is really low volume, LightSail second, and full AWS third.

As far as Lambda, I often recommend proxy integration, where you can just use the standard API framework for whichever language you choose (Django, Flask, Express, ASP.Net, etc) add three lines of code and push the entire thing into Lambdas. This gives you the flexibility to deploy to a VM later with no code changes.

For your static assets use S3, except for the case of Lightsail where you get plenty of free bandwidth.

[+] root993|5 years ago|reply
Hello, author here.

I totally agree on Lambda first. The only reason we did not do that is because when we started this product I was not well versed with Lambda and serverless and preferred to work with something that I dealt with previously.

If I could go back in time, I would set up all our applications on serverless.

[+] lowmemcpu|5 years ago|reply
> I often recommend proxy integration

Could you go further into what this is? Do you mean the API Gateway proxy integration? Is there a code sample somewhere

[+] ChicagoDave|5 years ago|reply
I’m wondering if Lambdas and DynamoDB would be competitive to this setup.
[+] peterwwillis|5 years ago|reply
You can shop for deals on VPSes at lowendbox.com. But if you're trying to run a business, this is a waste of time. Find a provider which is highly reliable, which can also automatically rebuild all failed infrastructure with no intervention needed. That eliminates 99.999% of the providers out there. You're paying a premium to never have to think about your tech again, so you can focus on the business.

Besides using the free tier and other AWS services which are practically free at low uses, you can use cost effective options like Fargate Spot Instances and EC2 Reserved Instances. I highly recommend Fargate over running instances. Use Lambda with CloudWatch triggers if you need to schedule occasional jobs (or use Fargate's feature for that). Try to avoid heavy reliance on caches, ElastiCache is kind of a rip off. Move as much content to static as possible, use CloudFlare to reduce bandwidth costs. If you're gonna serve over S3, you might as well front it with CloudFront as it's actually cheaper due to caching at the edge, and also more reliable. ALBs are expensive but very useful for APIs as well as autoscaling (if you have to run instances, run them with an ASG, which also means having versioned AMIs)

[+] barnabask|5 years ago|reply
I'm interested in the managed DB part of this writeup, specifically that OP chose Lightsail. The last time I looked into it, Lightsail was MySQL only, so that was good to know.

I wrote a PostgreSQL DBaaS Calculator that got some traction here a while ago, and I just updated it this evening to add Lightsail to see how it stacks up: https://barnabas.me/articles/postgres-dbaas.html#calculator

No surprise, Lightsail is similar pricing to RDS with a 1-year commitment, but it's month-to-month. It's a pretty good deal until you need more than 8 GB memory, 240 GB storage, 2 cores ($115/month Standard plan). But Azure or AWS RDS are the ones to beat.

[+] anthony_barker|5 years ago|reply
When I did data center stuff for a large company the biggest cost was network - not servers. Specifically network security and redundancy.

Its amazing to me that this is all included for basically free except for Application level filtering by these providers.

[+] EdwardDiego|5 years ago|reply
Yep, I was consulting for a firm with an already sizeable infrastructure investment/dependence on AWS, and their architects were proud of their best practice usage of multiple AZs for redundancy, but weren't know that inter-AZ traffic cost also. So I imagine they're going to have fun trying to incorporate that into their infrastructure layout.

In my day job we saved $600 a day by cutting down on needless inter-AZ traffic.

[+] tzs|5 years ago|reply
> An EC2 instance with 2 virtual cores, 4GB RAM and a storage of 80GB costs roughly 37$ a month and a Lightsail instance with the exact same configuration costs 20$ a month which is almost half the cost!

In general I'm not sure you can really compare instances from different service by just look at cores, RAM, and disk.

I used to have my small website and email server at Rackspace, on the smallest, cheapest instance. I ended up getting out of mail hosting (moved that to Fastmail), and putting the small website on Lightsail.

The Lightsail instance and the old Rackspace instance have the same nominal specs for virtual cores, RAM, and disk. (Actually, Lightsail may be better on disk--I don't recall if the Rackspace one was SSD like Lightsail).

The main thing the website actually does is host some graphs showing the temperature in my house. An ESP8266 temperature monitor I made uploads a sample once a minute. A script was running once a minute on the server that using gnuplot to make graphs of temperature over the last hour, 3 hours, 12 hours, 48 hours, and over all samples.

At Rackspace it ran that fine, while at the same time gathering my mail from several services via fetchmail, receiving mail for my domain on its smtp server, and running spam filtering.

On Lightsail just handling the website it was locking up every few hours. I managed to catch one pre-lockup and found the load average was something like 300.

What was happening was sometimes that once a minute graph generation task was taking longer than a minute, and that would slow down the next one, and so on. Oddly, it didn't seem to be gnuplot that was taking too long, but rather the script that took the file containing all the samples, and extracted just the samples newer than a specific threshold. At least, running everything by hand that was the only step I ever saw take unusually long.

I changed it to every 5 minutes to temporarily stop the frequent hanging, and then added a check to my script to skip regenerating the graphs if a previous instance of the script was still running to fix things permanently. I also changed it from storing the data in just a big file of samples sorted by time to an sqlite DB.

[+] fxtentacle|5 years ago|reply
TLDR: Use Lightsail, Amazon's approximation of a dedicated server.

In my opinion, that approach sacrifices pretty much all the benefits that cloud proponents usually talk about, like only paying for what you actually use or scaling up and down on demand.

In fact, I'm not sure what the actual difference would be between Lightsail and a good dedicated hoster that provides backup and failover services.

[+] alain_gilbert|5 years ago|reply
The difference would be:

- They could have much better service

- And bring down their expenses by at least half again

I used lightsail, and it's a terrible service.

It is very expensive (still) compared to other hosting providers.

And if your instance ever go out of memory, it become unresponsive for as long as you don't go manually restart it.

On other providers that I use, the OS would just "sacrifice child" (kill the process) and restart it.

It's not ideal, but much better than having to go there yourself to restart the whole thing.

[+] barrenko|5 years ago|reply
Slightly off-topic, but is there a book to learn about this kind of stuff, preferrably without having to sign up for AWS, but that does have code samples and whatall?
[+] t0mas88|5 years ago|reply
I don't understand why this is so special? Ours is about 4% for a very compute heavy service handling millions of events per day across four continents.