Why we moved from AWS RDS to Postgres in Kubernetes

[+] nunopato|3 years ago|reply

(Nhost)

Sorry for not answering everyone individually, but I see some confusion duo to the lack of context about what we do as a company.

First things first, Nhost falls into the category of backend-as-a-service. We provision and operate infrastructure at scale, and we also provide and run the necessary services for features such as user authentication and file storage, for users creating applications and businesses. A project/backend is comprised of a Postgres Database and the aforementioned services, none of it is shared. You get your own GraphQL engine, your own auth service, etc. We also provide the means to interface with the backend through our official SDKs.

Some points I see mentioned below that are worth exploring:

- One RDS instance per tenant is prohibited from a cost perspective, obviously. RDS is expensive and we have a very generous free tier.

- We run the infrastructure for thousands of projects/backends which we have absolutely no control over what they are used for. Users might be building a simple job board, or the next Facebook (please don't). This means we have no idea what the workloads and access patterns will look like.

- RDS is mature and a great product, AWS is a billion dolar company, etc - that is all true. But is it also true that we do not control if a user's project is missing an index and the fact that RDS does not provide any means to limit CPU/memory usage per database/tenant.

- We had a couple of discussions with folks at AWS and for the reasons already mentioned, there was no obvious solution to our problem. Let me reiterate this, the folks that own the service didn't have a solution to our problem given our constraints.

- Yes, this is a DIY scenario, but this is part of our core business.

I hope this clarifies some of the doubts. And I expect to have a more detailed and technical blog post about our experience soon.

By the way, we are hiring. If you think what we're doing is interesting and you have experience operating Postgres at scale, please write me an email at [email protected]. And don't forget to star us at https://github.com/nhost/nhost.

[+] akrymski|3 years ago|reply

Indeed RDS was never designed to be "re-sold", and assuming that a single PG instance will handle lots of different users is naive. Turns out if you're aiming to be an infra provider, building your own infra is the way to go. Who would have thought?

If I was launching a BaaS I wouldn't touch AWS. Grab a few Hetzner bare metal servers and setup your infra. You're leaving a massive profit margin to AWS when you don't have to.

[+] fmajid|3 years ago|reply

Are you using a Kubernetes PostgreSQL operator like pgo or CloudNativePG?

https://proopensource.it/blog/postgresql-on-k8s-experiences

[+] cloudbee|3 years ago|reply

And what are your cost savings from RDS perspective. I'd a similar problem where we'd to provision like 5 databases for 5 different teams. RDS is really expensive. And your solution is open source ? I would like to try.

[+] MBCook|3 years ago|reply

So they switch from one giant RDS instance with all tenants per AZ to per-tenant PG in Kubernetes.

So really we don’t know how much RDS was a problem compared to the the tenant distribution.

For the purposes of an article like this it would be nice if the two steps were separate or they had synthetic benchmarks of the various options.

But I understand why they just moved forward. They said they consulted experts, it would also be nice to discuss some of what they looked or asked about.

[+] raffraffraff|3 years ago|reply

Yeah. I mean, if you're going to use AWS database service for this use case, something that automatically scales based on load makes more sense, like Aurora Serverless. But that's also expensive. Regardless of cost, plain RDS isn't the right solution here as all.

[+] radimm|3 years ago|reply

Having recently heard a lot of about PostgreSQL in Kubernetes (cloudNativePG for example) it always makes me wonder about the actual load and the complexity of the cluster in the question.

> This is the reason why we were able to easily cope with 2M+ requests in less than 24h when Midnight Society launched

This gives the answer, while it's probably not evenly distributed gives 23 req/sec (guess peak 60 - 100 might be already stretching it). Always wonder about use cases around 3 - 5k req/sec as minimum.

[edit] PS: not really ditching neither k8s pg nor AWS RDS or similar solutions. Just being curious.

[+] Nextgrid|3 years ago|reply

> 23 req/sec (guess peak 60 - 100 might be already stretching it)

That kind of load is something a decent developer laptop with an NVME drive can serve, nothing to write home about.

It is sad that the "cloud" and all these supposedly "modern" DevOps systems managed to redefine the concept of "performance" for a large chunk of the industry.

[+] xani_|3 years ago|reply

It's essentially just a process running in a cgroup so performance shouldn't be all that different than bare metal/VM postgresql.

Main difference would be storage speed and how it exactly is attached to a container.

[+] kccqzy|3 years ago|reply

> This is the reason why we were able to easily cope with 2M+ requests in less than 24h

I thought this was referring to 2M+ requests per second over a ramp period of 24h, not 2M requests per 24h?

[+] XCSme|3 years ago|reply

2M+ requests per day can be handled on a pretty cheap VPS even by MySQL, but it depends on the request complexity and, more importantly, the database size.

[+] brand|3 years ago|reply

I’ve personally deployed O(TBs) and O(10^4 TPS) Postgres clusters on Kubernetes with a CNPG-style operator based deployment. There are some subtleties to it but it’s not exceeding complicated, and a good project like CNPG goes a long way to shaving off those sharp edges. As other commenters have suggested it’s good to really understand Kubernetes if you want to do it, though.

[+] MuffinFlavored|3 years ago|reply

> Having recently heard a lot of about PostgreSQL in Kubernetes

I could never get a straight answer on whether running a database in a container (and mounting the storage volume through a bind mount/network drive or whatever) came with a performance hit compared to running it as a systemd service for example.

[+] qeternity|3 years ago|reply

These threads are always full of people who have always used an AWS/GCP/Azure service, or have never actually run the service themselves.

Running HA Postgres is not easy...but at any sort of scale where this stuff matters, nothing is easy. It's not as if AWS has 100% uptime, nor is it super cheap/performant. There are tradeoffs for everyone's use-case but every thread is full of people at one end of the cloud / roll-your-own spectrum.

[+] ftufek|3 years ago|reply

Honestly, that's what I initially thought trying to run ha postgres on k8s, but zalando's postgres operator made things so much easier (maybe even easier than RDS). Very easy to rollout as many postgres clusters with whatever size you want. We've been running our production db on it for the last 6 months or so, no outage yet. Though I guess if you have to have a very custom setup, it might be more difficult.

[+] api|3 years ago|reply

I wonder how many people use things like CockroachDB, Yugabyte, or TiDB? They're at least in theory far easier to run in HA configurations at the cost of some additional overhead and in some cases more limited SQL functionality.

They seem like a huge step up from the arcane "1980s Unix" nightmare of Postgres clustering but I don't hear about them that much. Are they not used much or are their users just happy and quiet?

(These are all "NewSQL" databases.)

[+] 988747|3 years ago|reply

I've been successfully running Postgres in Kubernetes with the Operator from Crunchy Data. It makes HA setup really easy with a tool called Patroni, which basically takes care of all the hard stuff. Running 1 primary and 2 replicas is really no harder than running single-node Postgres.

[+] jmarbach|3 years ago|reply

$0.50 per extra GB seems high, especially for a storage-intensive app. Given the cost of cloud Object Storage services it doesn't seem to make much sense.

Examples of alternatives for managed Postgres:

* Supabase is $0.125 per GB

* DigitalOcean managed Postgres is ~$0.35 per GB

[+] claytongulick|3 years ago|reply

I really wish I could use DO, but unless something has changed recently, they don't support delta backups, which is a deal killer for me.

For small startups, my DR/HA plan is hourly delta snapshots of the whole volume.

GCP, AWS and Azure all make this possible.

[+] makestuff|3 years ago|reply

SUpabase runs on AWS so they are either losing a ton of money, have some amazing deal with AWS, or the $0.50 is inaccurate.

[+] neilv|3 years ago|reply

I didn't see "backups" mentioned in that, though I'm sure they have them. Depending on your needs, it's a big thing to keep in mind while weighing options.

For a small startup or operation, a managed service having credible snapshots, PITR backups, failover, etc. is going to save a business a lot of ops cost, compared to DIY designing, implementing, testing, and drilling, to the same level of credibility.

One recent early startup, I looked at the amount of work for me or a contractor/consultant/hire to upgrade our Postgres recovery capability (including testing and drills) with confidence. I soon decided to move from self-hosted Postgres to RDS Postgres.

RDS was a significant chunk of our modest AWS bill (otherwise, almost entirely plain EC2, S3, and traffic), but easy to justify to the founders, just by mentioning the costs it saved us for business existential protection we needed.

[+] nunopato|3 years ago|reply

Thanks for bringing this up. We do have backups running daily, and we will have "backups on demand" soon as well.

[+] xwowsersx|3 years ago|reply

I've recently been spending a fair amount of time trying to improve query performance on RDS. This includes reviewing and optimizing particularly nasty queries, tuning PG configuration (min_wal_size, random_page_cost, work_mem, etc). I am using a db.t3.xlarge with general purpose SSD (gp2) for a web server that sees moderate writes and a lot of reads. I know there's no real way to know other than through testing, but I'm not clear on which instance type best serves our needs — I think it may very well be the case that the t3 family isn't fit for our purposes. I'm also unclear on whether we ought to switch to provisioned IOPS SSD. Does anyone have any general pointers here? I know the question is pretty open-ended, but would be great if anyone has general advice from personal experience?

[+] notac|3 years ago|reply

I'd recommend hopping off of t3 asap if you're searching for performance gains - performance can be extremely variable (by design). M class will even you out.

General storage IOPS is governed by your provisioned storage size. You can again get much more consistent performance by using provisioned IOPS.

Feel free to email me if you want to chat through things specific to your env - email is in my about:

[+] Nextgrid|3 years ago|reply

It's hard to say without metrics; what does your CPU load look like? In general, unless your CPU is often maxing out, changing the CPU is unlikely to help, so you're left with either memory or IO.

Unused memory on Linux will be automatically used to cache IO operations, and you can also tweak PG itself to use more memory during queries (search for "work_mem", though there are others).

If your workload is read-heavy, just giving it more memory so that the majority of your dataset is always in the kernel IO cache will give you an immediate performance boost, without even having to tweak PG's config (though that might help even further). This won't transfer to writes - those still require an actual, uncached IO operation to complete (unless you want to put your data at risk, in which case there are parameters that can be used to override that).

For write-heavy workloads, you will need to upgrade IO; there's no way around the "provisioned IOPS" disks.

[+] paulryanrogers|3 years ago|reply

General storage IOPS scales with disk size, roughly and to a point. It's often cheaper and faster to increase the instance storage than move to EBS, prioritized or not.

Of course if you need to recover quickly in a disaster you'll want a hot standby or replica. Still may be cheaper than PIOPs. (Especially if you need HA anyway.)

[+] ransom1538|3 years ago|reply

I operate a large fleet of mysql db instances. We cannot use Cloudsql (RDS competitor) due mainly to cost. BUT, one thing left out, was the ability to have complex topologies. EG. MasterA <- SlaveA[1..n] <- MasterB <- SlaveB[1..n]. With extremely high writes, being able to cut and shard where you want if very powerful. In this example you could write to MasterB with different data. If i need to filter a table in replication: done. We don't need to beg AWS RDS team for the option to change a db variable (I have done this). Warning: Doing this stuff at scale with massive bills is very stressful. It took about a year to get everything ironed out [snapshots, autoscaling, sharding, custom monitoring, etc].

[+] qubit23|3 years ago|reply

I was hoping to see a bit more of an explanation of how this was implemented.

[+] KaiserPro|3 years ago|reply

In this instance I can see the point, being able to give raw access to customer's own psql instance is a good feature.

but. It sounds bloody expensive to develop and maintain a reliable psql service on k8s

[+] geggam|3 years ago|reply

I would love to see the monitoring on this.

Network IOPs and NAT nastiness or disk IO the bigger issue ?

[+] HL33tibCe7|3 years ago|reply

Couldn’t you just spin up an RDS instance for each project (so, single-tenant RDS instances) to avoid the noisy neighbour problem? Or is that too expensive?

[+] elitan|3 years ago|reply

We could, yes. But way to expensive compared to our current setup.

We're offering free projects (Postgres, GraphQL (Hasura), Auth, Storage, Serverless Functions) so we need to optimize costs internally.

[+] techn00|3 years ago|reply

So what solution did you end up using? Crunchy operator?

[+] nesmanrique|3 years ago|reply

We evaluated several operators but at the end decided it would be best to deploy our own setup for the postgres workloads instead using helm.

[+] xyzzy_plugh|3 years ago|reply

If the cost of operating a postgres database is eating into your margins so much (and you can't simply adjust your prices to eat the difference) then I would suspect the wrong technology is in place.

Sure, RDS is expensive, but it's also quite well done. Almost every cloud platform service is more expensive than doing it yourself. No surprise here.

In the past I've deployed SQLite over Postgres for cost cutting reasons. It's not too difficult to swap out unless you're heavily bought into database features.

[+] movedx|3 years ago|reply

> Almost every cloud platform service is more expensive than doing it yourself. No surprise here.

In a business environment, this is actually not true unless you consider the extreme long term.

A Multi-AZ MySQL RDS instance of size db.m1.large (2x vCPU, 7.5GB of RAM), a 500GB standard disk, and an on-demand pricing model with 100% monthly utilization, will cost you approx. US$7,000 per year (rounding up.) That price gets you almost everything you can imagine from that service.

US$7,000 wouldn't get you my services for the time needed to setup a service that came even 30% as close in terms of reliability, feature parity and support.

RDS is not expensive (in the right environment.)

[+] mp3tricord|3 years ago|reply

In a production data base why are people executing long running queries on the primary. They should be using a DB replica.

[+] e-clinton|3 years ago|reply

Congrats on the launch. Curious to see what else is in store for this week.

Do I have to manually upgrade my old instances?

[+] elitan|3 years ago|reply

Thank you. It's going to be a fun week!

We're working on a one-click migration from RDS to a dedicated Postgres instance for older projects. Should be live in the next week or so.

[+] stunt|3 years ago|reply

What's the benefit of running Postgres in Kubernetes vs VMs (with replication obviously)?

[+] maxyurk|3 years ago|reply

did you consider https://www.pgbouncer.org/ ?

[+] 0xbadcafebee|3 years ago|reply

Ah, the 'ol sunk cost fallacy of infrastructure. We are already investing in supporting K8s, so let's throw the databases in there too. Couldn't possibly be that much work.

Sure, a decade-old dedicated team at a billion-dollar multinational corporation has honed a solution designed to support hundreds of thousands of customers with high availability, and we could pay a little bit extra money to spin up a new database per tenant that's a little bit less flexible, ..... or we could reinvent everything they do on our own software platform and expect the same results. All it'll cost us is extra expertise, extra staff, extra time, extra money, extra planning, and extra operations. But surely it will improve our product dramatically.

[+] suggala|3 years ago|reply

AWS RDS is 10x slower than BareMetal MySQL (both reads and writes). Slowness is mainly due to the reason that Storage is over network for RDS.

Not bad to invest some extra time to get better performance.

You are falling to “Appeal to antiquity” fallacy if you think something old is better.

[+] gw99|3 years ago|reply

I'm not so sure. All you have is another layer of abstraction between you and the problem that you are facing. And that level of abstraction may violate your SLAs unless you pitch $15k for the enterprise support option. And that may not even be fruitful because it relies on an uncertain network of folk at the other end who may or may not even be able to interpret and/or solve your problem. Also you are at the whim of their changes which may or may not break your shit.

Source: AWS user on very very large scale stuff for about 10 years now. It's not magic or perfection. It's just someone else's pile of problems that are lurking. The only consolation is they appear to try slightly harder than the datacentres that we replaced.

145 comments