top | item 25071502

Migrating Large Heroku Postgres Instances to AWS Aurora Without Downtime

124 points| sciguymcq | 5 years ago |thecodinginterface.com | reply

39 comments

order
[+] 0xCMP|5 years ago|reply
Very thorough and nice article! It's really nice to have this all in one place for reference later.

In skimming this page I wish the following had been more prominent:

> In the Real World for this process to work you need to request for the Heroku Data Support team to establish continuous WAL log shipping to an AWS S3 bucket along with a base physical backup using the WAL-E Python based library.

I kept searching in vain for where exactly you were able to get a WAL-E backup into the bucket and finally found that line.

[+] ekump|5 years ago|reply
This is a great write-up! I wrote a far less thorough blog post on the same topic a few months ago: https://vericred.github.io/2020/08/10/migrating-to-aurora.ht....

Overall, we're happy with our decision to migrate to Aurora. We feel it offers better performance, observability and scalability compared to heroku postgres. To echo some other comments:

1. Jesse at Heroku is great. I had never setup my own postgres instance before, and his guidance helped me a lot.

2. We used DMS, and it was ok, not great. Dealing with LOBs was a bit of a pain. In order to keep the migration time to a minimum we setup multiple DMS tasks and grouped tables into those tasks based on their max LOB size. Most of our tables didn't have any LOB columns and they migrated quickly.

3. The ability to fast clone a database is nice. We can create a copy of our production database in about five minutes for use in non-prod environments. This would take a couple of hours in heroku.

4. The Aurora IO costs are ridiculous. Are there any good tools to see what is exactly causing so much I/O?

[+] craigkerstiens|5 years ago|reply
DMS under the covers is another tool. The Postgres support for it lags pretty well and is mostly and after thought. It's much better to try to leverage something more native, either the WAL or if on a more modern version take advantage of logical replication.

For coming off Heroku DMS isn't going to work cause you don't have a replication slot, so your only approach is wal-e, or maybe wal-g?

Fast cloning is one nice feature for sure of Aurora, but when you go into Aurora you're starting to deviate more and more from native Postgres. Fast cloning can definitely be done one without Aurora. We're working on bringing our ability on Crunchy Bridge down to a few minutes similar to Aurora, while still being 100% Postgres. Not there yet, but suspect we're a few months away.

[+] aeyes|5 years ago|reply
> 4. The Aurora IO costs are ridiculous. Are there any good tools to see what is exactly causing so much I/O?

I don't use Aurora so I don't know what is available but pg_stat_statements does have I/O statistics so you can review it manually or use a tool like pgAnalyze (not sure if it supports Aurora) to see what queries are I/O intensive.

[+] nijave|5 years ago|reply
Not sure about cost and availability but Performance Insights for RDS postgres will show waits on various resources (disk, CPU) and is fairly easy to use. If it's like RDS postgres you can setup logging and use something like pgbadger (you would need to adjust the parameter group logging options to capture the right details)
[+] shangxiao|5 years ago|reply
Nice!

Slight nitpick: Django should automatically create the Hstore extension for you [1] during migrate whenever you use a HStoreField

[1] https://github.com/django/django/blob/f5e07601b233a50e6bcca4...

Edit: I might've been mistaken here in thinking the above migration operation was automatically applied as the docs seem to suggest you need to add it yourself [2] but at least it's as easy as adding an operation ;)

[2] https://docs.djangoproject.com/en/3.1/ref/contrib/postgres/f...

[+] pedrokost|5 years ago|reply
The author has gained a significant amount of Postgres-related knowledge when performing this migration.

Taking this into consideration, is it still worth using the managed Aurora, instead of the EC2 self-managed instance?

[+] sciguymcq|5 years ago|reply
I currently manage, or am "lead dev", for around a half dozen to 10 apps using Postgres as the database. For two of those I use a managed database service (AWS RDS / AWS Aurora Postgres) because they drive mission critical high value products / services and I wouldn't consider using a standalone install of Postgres. For other apps that are not so mission critical I am perfectly happy saving considerable money and running them on a VPS on AWS EC2 or Digital Ocean or Linode. So ultimately it depends on the app and the risk tolerance for Recovery Time Objective and Recovery Point Objective.
[+] takeda|5 years ago|reply
If you have a CM infrastructure set up, you're not gaining much. Tools like repmgr/barman (in public cloud you probably would use WAL-E/WAL-G) provide nearly all of the benefits.

PostgreSQL is generally low maintenance, the only cases when this might not be true is when you're hitting performance bottlenecks and need to tune your database, optimize your tables, or use bigger instance etc, but RDS doesn't save you from that.

[+] jrochkind1|5 years ago|reply
RDS Postgres (actual postgres, not the Aurora product which can be "mostly postgres compatible") is another option, not just "EC2 self-managed instance".

I guess we'd have to look at price comparisons for the size you actually need. I think depending on load, the price differences may be nominal.

[+] mijoharas|5 years ago|reply
Might have been nice to see this post a year ago :).

My company migrated our ~2TB postgres db from heroku to AWS (originally aurora, we switched back to RDS postgres). Haven't had time for a blog post, but figure I might post our learnings here, and the differences in our approach.

* We did the second step (ec2 -> aurora) via AWS DMS (database migration service). This service may be a little more of a pain than it's worth, but it's what we settled on.

* we used wal-g rather than wal-e for the initial phase, it has slightly better performance, and we were happy with the tool.

* We wrote ansible playbooks to perform the heroku -> ec2 migration. it's definitely nice and makes you feel a little more secure having run the db migration multiple times over, and it would be a giant PITA to do all that manually. (hadn't used ansible much before, I have to say, I'm a fan! had used chef/puppet in bits and pieces before and was never that keen on them).

* Aurora wasn't the right fit for us. Be careful about reading the pricing. On the AWS pricing calculator it asks you for how many million requests you have per month. This is _not_ how it's charged. If you read the docs, the IO charges apply to AuroraReadIO and AuroraWriteIO, which is not per db request, it's per db storage request, which is number of requests * number of db blocks that each request fetches.

* Our Aurora costs were through the roof, (and variable and spiky, never something you want!). We migrated again to RDS postgres, and bumping the provisioned IOPs up we had equivalent performance (possibly better?) for half the price (and the price was stable). We benchmarked everything else but the cost with the db migration, instead relying on the AWS simple pricing calculator. The difference between predicted and actual storage IO costs was over 10x out if my memory serves (making the entire DB double the price per month). BEWARE!

* Aws DMS is kinda buggy, last I checked the table selection editor UI didn't work when you added/modified multiple fields. You can switch to a json editor, so generate the table selection once, and check it in the json editor in case the UI got borked.

* Speed wasn't great with AWS DMS as well, we had to flip BatchApplyEnabled on, and take a lot of care with it.

* There were definitely some shenanigans to do with LOBs, but I can't fully remember them.

* DMS does not update id sequences, so if you have any tables with incrementing ids, make sure that you set the sequences manually before switching dbs.

* We used debian as our intermediate ec2 instance. We've historically used ubuntu for many ec2 instances, but the default ubuntu AMI image uses MBR for it's partition (!!! in 2020!!!) and while we had some code to swap it to GPT while running, that felt a little too spicy for a db migration, so we used base debian which has a sane GPT partitioned disk for it's AMI. (Note: we needed to create partitions larger than what MBR can support so this was a hard requirement).

* The last learning I should add, is that heroku support were actually really great during the whole process, very knowledgeable and gave some setup example files for the ec2 instance. After that they basically leave it to you, but I can't speak highly enough about them (thanks jessie!)

* One more for you, be careful about the version of libc that you have. even if the postgres versions are identical, subsequent libc versions can have slightly different string sorting, which can corrupt your indexes that are migrated after copying the basebackup. (we needed to rebuild an index or two due to this, not a huge deal, but an interesting gotcha. I swear I checked our locale settings 5 times when it went wrong until I clocked on to it being the libc version.)

There were probably more learnings than that, but that's what comes to mind.

(EDIT: added final bullet point). (EDIT 2: added one more bullet)

[+] sciguymcq|5 years ago|reply
I originally wanted to use DMS but our DB failed the pre-migration assessment so that forced us to Logical Replication.

We also worked with Jesse. You are right he is amazingly helpful. Unfortunately we got some pretty questionable help from others before he was assigned to our support ticket but once that happened it was infinitely improved.

[+] yowlingcat|5 years ago|reply
Fantastic comment, thank you so much. I've been following Aurora for a while but a little curious (and skeptical) about sharp edges vs vanilla PG. If I may ask you a follow-on question, who do you think Aurora would be the right fit for?
[+] tenaciousDaniel|5 years ago|reply
I'm considering moving from a self-hosted Mongo setup to Aurora, either with PG or MySQL. Anyone here have opinions on Aurora?
[+] bastawhiz|5 years ago|reply
It does the job, but aurora postgres lags behind mysql when it comes to features. Failovers are surprisingly slow, too. You're pretty much stuck with cloudwatch for monitoring, which has significant lag.
[+] mnutt|5 years ago|reply
Aurora (Postgres) looked like it would be great, and the performance for our use case was much better than RDS. But there was a showstopper for us with the replication: If the cluster leader fails, _every read replica will force a restart_ to elect a new leader, resulting in minimum ~30s complete cluster downtime. In our situation we have no problem with a short downtime in write availability, but the fact that we can't maintain read availability via replicas is a deal breaker.
[+] dageshi|5 years ago|reply
If you're thinking of going for the version that auto sleeps when not in use be aware there's a 30-40 second startup time from first incoming request. If it's a database you don't need to be accessible except in bursts this can be really handy for cutting costs as you're not paying for any read/write resources you're not using. Of course if you do need it generally accessible all the time then you'd want to provision it accordingly.
[+] mbesto|5 years ago|reply
I've heard mixed reviews on Aurora MySQL. Some people move to it and see huge performance gains, others simply don't or have some limitations.
[+] atanasovskib|5 years ago|reply
I would suggest checking out other smaller cloud providers for managed databases over Aurora. Check out Aiven.io
[+] patman81|5 years ago|reply
That's impressive. We struggle to find a way to "migrate" from AWS Aurora 1 to 2 (that is from MySQL 5.6 to 5.7). There seems to be no (easy) way to upgrade without downtime for large databases.