Why would you run your own Postgres instance on EC2 within AWS? That kind of defeats the purpose of paying for AWS. Why not use Postgres RDS or Aurora?
It makes some sense with Sql Server and Oracle in a few cases because of licensing but hosting your own Postgres instance on AWS is the worse of both worlds -- you're paying more than with a cheaper VPS and you have to do all of the maintenance yourself and not taking advantage of all of things that AWS provides -- point in time restores, easy cross region read replicas, faster disk io (Aurora), etc.
Hi. I'm one of the database engineers at Heap. This is a good question. There are several reasons why we use EC2. First of all, I will say I love RDS as a product. We actually do use RDS for a number of our services. We use Postgres on EC2 only for our primary data store. As for reasons why we use EC2:
Cost - Our primary data store has >1 Petabyte of raw data stored across dozens of Postgres instances. The amount of data we store is at the point where RDS is too expensive for us. The cost of an instance on RDS is more than twice the cost on EC2. For example, an on-demand r4.8xl on EC2 instance costs $2.13 an hour, while an RDS r4.8xl costs $4.80 an hour.
Performance - The only kind of disk available on RDS is EBS. EBS is slow compared to the NVMe the i3s provide. We used to use r3s with EBS and got a major speedup when we switched to i3s. As a side note, the cost of an i3 is also less than the cost of an r3 with an equivalent amount of EBS.
Configuration - By using EC2 we can configure our machines in ways we wouldn't be able to if we used RDS. For example, we run ZFS on our EC2 instances which compresses our data by 2x. By compressing our data, we get a major cost saving and a major performance boost at the same time! There isn't an easy way to compress your data if you use RDS.
Introspection - There are times where we've needed to debug performance problems with Postgres and EXPLAIN ANALYZE won't suffice. A good example is we used flame graphs to see what Postgres was using CPU for. We made a small change that resulted in a 10x improvement to ingestion throughput. If you are curious, I wrote a blog post on this investigation: https://heapanalytics.com/blog/engineering/basic-performance...
In my experience you get much better performance outside of RDS and you can inspect and tune it better. Maybe I’m missing something and no doubt I could put more work into it but we’ve actually talked about moving our RDS dbs back to EC2 because there are plenty of queries we do that are embarrassingly slow on RDS when they shouldn’t be.
Also, you can’t replicate out of RDS. I like to know where my data is and how to bring it back online during a disaster.
Sometimes you're forced to use AWS for a very specific reason, technical or non-technical, subject to change in the future. (For example, literally being given a large 7 figure "credit" as occurred at a previous job.) Letting the application become completely and irreversibly dependent on [expensive] AWS services may not be desirable.
But who are we kidding... it's impossible to resist and its why AWS rakes in cash.
That said, not being able to use vDSOs for time querying APIs isn't just a problem for databases, it's potentially a significant problem for asynchronous software, like Node.js, that do userspace event scheduling bookkeeping. Typically every iteration of the event loop performs at least one time query, but depending on how the software is coded often there might be one or more time queries per event processed per event loop iteration.
Given that they're an analytics company, they probably have 2 problems with using Aurora/RDS.
Note that these are educated guesses from their statement: "Heap’s data is stored in a Postgres cluster running on i3 instances in EC2. These are machines with large amounts of NVMe storage—they’re rated at up to 3 million IOPS, perfect for high transaction volume database use cases"
1. Aurora charges per-request ($0.20/million). Given that analytics comes with tons of events and that they wanted servers that have up to 3 million IOPS, it can get pricey fast.
2. RDS has database instances that have SSDs that provide "up to 40,000 IOPS" per instance in their provisioned case, which is probably not enough.
Hi post author here! First off, we actually do use RDS for other databases. As you point out, having a lot of the operational stuff taken care of for you is great.
The post is specifically about our Citus cluster, which stores the analytics data for all our customers. Most of the reasons we do this have been given by other folks in the replies:
* RDS doesn't support the Citus extension
* data is stored on ZFS for filesystem compression
* we get significantly higher disk performance from these
instances' NVMe attached storage, which isn't available for
RDS
A couple of reasons, in this case for a Postgres instance that has 2TB of data.
1) price: you’ll easily spend $$$$$$ on RDS. If you host it on something equivalent with native SSD you’re looking at $800 a month with better performance
2) performance. It’s way faster and you can tune your indices, create views and make it fast and efficient in a predictable way.
3) if you want to, you can easily migrate to a different service. We did just that two months ago, from google cloud to AWS. It gives us vendor independence.
Heap is an analytics company that stores all (json) events automatically and then provides real-time queries. Their business sustainability is directly tied to how effectively they store and query this data.
They solve it with a large cluster of Postgres server running fast disks with partitioning handled by the Citus extension, along with several low-level tweaks.
RDS does not support these scales or any clustering. Aurora does not have parallel processing. Redshift would be fast but is very expensive and does not have the same level of Postgres features.
There is Citus Cloud so you can get close to Heap's setup with Citus maintaining it all, but that gets pricey too.
If you compare the hourly cost for any instance on EC2 vs RDS, it's significantly more expensive for the managed solution (75%+ more), which is to be expected. I know people who roll their own for cost savings.
Postgres RDS is missing libprotobuf-c, which is a dependency for cutting MapBox Vector Tiles if you use PostGIS/Postgres that way. A small, legitimate exception to your statement.
Our case: AWS is not the only environment we support. We deploy the same PG version, same config files and scripts across all major clouds, openstack and even baremetal.
Well, if your business requires running databases really well it might be preferable to run your own. If they were using RDS they would not have been able to get this level of insight into their infra.
Maybe RDS is already configured correctly. Maybe not. But at a certain scale not having your hands tied becomes more valuable than having everything taken care of for you.
Well for one it makes it a lot easier to have a duplicate local environment (i.e., not on AWS) to test on before pushing changes into your production environment (on AWS). It also helps prevent vendor lock-in. I'll stick to managing my own database instances.
Defeats the purpose? There’s a lot that you cannot do with hosted software that you can do with your own custom setup. When you outgrow the hosted version the natural progression is to do it yourself.
You don't get super user access to postgres via RDS. You can't use logical replication. Plenty of other plugins don't work. For complex use cases, RDS is often a no go.
Because I can get extremely high IOPs on EC2 without paying a fortune. EBS is slow compared to MME SSD drives. Stripe 4 MME drives together and you can have over 1 million IOPs.
At the time, we were trying to benchmark disk I/O for new platforms, but we found that things were underperforming compared to the specifications for the hardware. We figured out that fio was reading the clock before/after each I/O (which isn't really necessary unless you really care about latency measurement) and just by reading the clock we were rate limiting our I/O throughput. By switching to "clocksource=tsc" in our fio config, we managed to get the performance behavior we expected.
> Note that the 100ns mentioned above is largely due to the fact that my Linux box doesn’t support the RDTSCP instruction, so to get reasonably accurate timings it’s also necessary to issue a CPUID instruction prior to RDTSC to serialize its execution.
Huh? That’s definitely not true now, and I don’t think it ever was. Linux uses LFENCE or MFENCE, depending on CPU.
And, EC2 does not live migrate VMs across physical hosts. I couldn’t find anything explicit from AWS on this, but it’s something that Google is happy to point out.
Is it a good idea for a production database to depend on a feature not being used when the vendor hasn't said that they don't or won't use it? They may very well live-migrate when convenient, but just don't expose that functionality to customers since they don't want customers demanding it.
With the instance types they're using the migration isn't really an option because the point of the i3 instances is the locally attached NVMe SSD disks where the database files are.
The "ephemeral" term is a legacy. Unfortunately it is part of the EC2 API for the block device mapping [1] of the "classic" instance store interfaces on the Xen platform. I don't know exactly when we stopped using "ephemeral" in our documentation, but I think it was with the introduction of EBS around 2008.
The "ephemeral" term confuses a lot of customers, and that's why we stopped using it. Data written to local storage is not transient, fleeting, or short lived. By 2010 we had transitioned to using "instance storage" in the documentation [2], which included a big note about how the data remains if an instance reboots for any reason (planned or unplanned).
Still, there is a misconception that data on local instance store volumes (both the more "classic" HDD or SSD volumes that are virtualized by Xen, as well as the new generation of local NVMe storage) could vanish due to this vestigial term that lingers in the API. Many customers, as well as services like Amazon Aurora [3], build highly durable and available systems on local instance storage.
Post author here. It's ephemeral, yes. It survives reboots, so that's not a problem. It doesn't survive instance-stop, so if a machine is being decommissioned by AWS we do indeed lose its data. As for how we protect against it, the main thing is replication: the data is stored on more than one machine. If we lose a machine for whatever reason, the shards from that machine are copied from a replica to another DB instance.
Reboots are actually fine as ephemeral data will persist through a reboot on an EC2 instance. Your question is still valid though in case of halting and how you deal with it is specific to your application, but you have to be able to handle all the data on that ephemeral disk disappearing without warning.
One way in the case of a database could be a second EC2 instance configured as a read replica in a different AZ.
[+] [-] scarface74|7 years ago|reply
It makes some sense with Sql Server and Oracle in a few cases because of licensing but hosting your own Postgres instance on AWS is the worse of both worlds -- you're paying more than with a cheaper VPS and you have to do all of the maintenance yourself and not taking advantage of all of things that AWS provides -- point in time restores, easy cross region read replicas, faster disk io (Aurora), etc.
[+] [-] malisper|7 years ago|reply
Cost - Our primary data store has >1 Petabyte of raw data stored across dozens of Postgres instances. The amount of data we store is at the point where RDS is too expensive for us. The cost of an instance on RDS is more than twice the cost on EC2. For example, an on-demand r4.8xl on EC2 instance costs $2.13 an hour, while an RDS r4.8xl costs $4.80 an hour.
Performance - The only kind of disk available on RDS is EBS. EBS is slow compared to the NVMe the i3s provide. We used to use r3s with EBS and got a major speedup when we switched to i3s. As a side note, the cost of an i3 is also less than the cost of an r3 with an equivalent amount of EBS.
Configuration - By using EC2 we can configure our machines in ways we wouldn't be able to if we used RDS. For example, we run ZFS on our EC2 instances which compresses our data by 2x. By compressing our data, we get a major cost saving and a major performance boost at the same time! There isn't an easy way to compress your data if you use RDS.
Introspection - There are times where we've needed to debug performance problems with Postgres and EXPLAIN ANALYZE won't suffice. A good example is we used flame graphs to see what Postgres was using CPU for. We made a small change that resulted in a 10x improvement to ingestion throughput. If you are curious, I wrote a blog post on this investigation: https://heapanalytics.com/blog/engineering/basic-performance...
[+] [-] aidos|7 years ago|reply
Also, you can’t replicate out of RDS. I like to know where my data is and how to bring it back online during a disaster.
[+] [-] wahern|7 years ago|reply
But who are we kidding... it's impossible to resist and its why AWS rakes in cash.
That said, not being able to use vDSOs for time querying APIs isn't just a problem for databases, it's potentially a significant problem for asynchronous software, like Node.js, that do userspace event scheduling bookkeeping. Typically every iteration of the event loop performs at least one time query, but depending on how the software is coded often there might be one or more time queries per event processed per event loop iteration.
[+] [-] arciini|7 years ago|reply
Note that these are educated guesses from their statement: "Heap’s data is stored in a Postgres cluster running on i3 instances in EC2. These are machines with large amounts of NVMe storage—they’re rated at up to 3 million IOPS, perfect for high transaction volume database use cases"
1. Aurora charges per-request ($0.20/million). Given that analytics comes with tons of events and that they wanted servers that have up to 3 million IOPS, it can get pricey fast.
2. RDS has database instances that have SSDs that provide "up to 40,000 IOPS" per instance in their provisioned case, which is probably not enough.
[+] [-] kalmar|7 years ago|reply
The post is specifically about our Citus cluster, which stores the analytics data for all our customers. Most of the reasons we do this have been given by other folks in the replies:
[+] [-] ricw|7 years ago|reply
1) price: you’ll easily spend $$$$$$ on RDS. If you host it on something equivalent with native SSD you’re looking at $800 a month with better performance
2) performance. It’s way faster and you can tune your indices, create views and make it fast and efficient in a predictable way.
3) if you want to, you can easily migrate to a different service. We did just that two months ago, from google cloud to AWS. It gives us vendor independence.
[+] [-] flurdy|7 years ago|reply
If a low throughput, less risky DB then your own DB may make sense.
If a normal business use DB, outsourcing the maintenance and risk to RDS may make sense.
[+] [-] manigandham|7 years ago|reply
They solve it with a large cluster of Postgres server running fast disks with partitioning handled by the Citus extension, along with several low-level tweaks.
RDS does not support these scales or any clustering. Aurora does not have parallel processing. Redshift would be fast but is very expensive and does not have the same level of Postgres features.
There is Citus Cloud so you can get close to Heap's setup with Citus maintaining it all, but that gets pricey too.
[+] [-] nil_pointer|7 years ago|reply
[+] [-] dylrich|7 years ago|reply
https://forums.aws.amazon.com/thread.jspa?threadID=277371
[+] [-] outworlder|7 years ago|reply
[+] [-] misframer|7 years ago|reply
[+] [-] kornish|7 years ago|reply
[+] [-] hamandcheese|7 years ago|reply
Maybe RDS is already configured correctly. Maybe not. But at a certain scale not having your hands tied becomes more valuable than having everything taken care of for you.
[+] [-] cpburns2009|7 years ago|reply
[+] [-] whalesalad|7 years ago|reply
[+] [-] chillydawg|7 years ago|reply
[+] [-] adrr|7 years ago|reply
[+] [-] PopeDotNinja|7 years ago|reply
[+] [-] nasalgoat|7 years ago|reply
[+] [-] gigatexal|7 years ago|reply
[+] [-] foolfoolz|7 years ago|reply
[+] [-] AbacusAvenger|7 years ago|reply
At the time, we were trying to benchmark disk I/O for new platforms, but we found that things were underperforming compared to the specifications for the hardware. We figured out that fio was reading the clock before/after each I/O (which isn't really necessary unless you really care about latency measurement) and just by reading the clock we were rate limiting our I/O throughput. By switching to "clocksource=tsc" in our fio config, we managed to get the performance behavior we expected.
[+] [-] logicallee|7 years ago|reply
can you put this into roughly quantitative terms? How much of a performance hit did you remove this way?
[+] [-] wallstprog|7 years ago|reply
If you're interested in clocks on Linux, you might also find this article useful (shameless plug): http://btorpey.github.io/blog/2014/02/18/clock-sources-in-li...
[+] [-] amluto|7 years ago|reply
Huh? That’s definitely not true now, and I don’t think it ever was. Linux uses LFENCE or MFENCE, depending on CPU.
[+] [-] misiti3780|7 years ago|reply
[+] [-] xahrepap|7 years ago|reply
[+] [-] FargoPelz|7 years ago|reply
[+] [-] aarongolliver|7 years ago|reply
[+] [-] kalmar|7 years ago|reply
[+] [-] Johnny555|7 years ago|reply
Is it a good idea for a production database to depend on a feature not being used when the vendor hasn't said that they don't or won't use it? They may very well live-migrate when convenient, but just don't expose that functionality to customers since they don't want customers demanding it.
[+] [-] KayEss|7 years ago|reply
[+] [-] unknown|7 years ago|reply
[deleted]
[+] [-] tofflos|7 years ago|reply
[+] [-] _msw_|7 years ago|reply
The "ephemeral" term is a legacy. Unfortunately it is part of the EC2 API for the block device mapping [1] of the "classic" instance store interfaces on the Xen platform. I don't know exactly when we stopped using "ephemeral" in our documentation, but I think it was with the introduction of EBS around 2008.
The "ephemeral" term confuses a lot of customers, and that's why we stopped using it. Data written to local storage is not transient, fleeting, or short lived. By 2010 we had transitioned to using "instance storage" in the documentation [2], which included a big note about how the data remains if an instance reboots for any reason (planned or unplanned).
Still, there is a misconception that data on local instance store volumes (both the more "classic" HDD or SSD volumes that are virtualized by Xen, as well as the new generation of local NVMe storage) could vanish due to this vestigial term that lingers in the API. Many customers, as well as services like Amazon Aurora [3], build highly durable and available systems on local instance storage.
[1] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/block-de...
[2] https://web.archive.org/web/20111113011016fw_/http://docs.am...
[3] https://www.allthingsdistributed.com/files/p1041-verbitski.p...
[+] [-] kalmar|7 years ago|reply
[+] [-] cornellwright|7 years ago|reply
One way in the case of a database could be a second EC2 instance configured as a read replica in a different AZ.
[+] [-] abraham_lincoln|7 years ago|reply