top | item 46337156

(no title)

heipei | 2 months ago

I still don't get how folks can hype Postgres with every second post on HN, yet there is no simple batteries-included way to run a HA Postgres cluster with automatic failover like you can do with MongoDB. I'm genuinely curious how people deal with this in production when they're self-hosting.

discuss

order

franckpachot|2 months ago

It's largely cultural. In the SQL world, people are used to accepting the absence of real HA (resilience to failure, where transactions continue without interruption) and instead rely on fast DR (stop the service, recover, check for data loss, start the service). In practice, this means that all connections are rolled back, clients must reconnect to a replica known to be in synchronous commit, and everything restarts with a cold cache.

Yet they still call it HA because there's nothing else. Even a planned shutdown of the primary to patch the OS results in downtime, as all connections are terminated. The situation is even worse for major database upgrades: stop the application, upgrade the database, deploy a new release of the app because some features are not compatible between versions, test, re-analyze the tables, reopen the database, and only then can users resume work.

Everything in SQL/RDBMS was thought for a single-node instance, not including replicas. It's not HA because there can be only one read-write instance at a time. They even claim to be more ACID than MongoDB, but the ACID properties are guaranteed only on a single node.

One exception is Oracle RAC, but PostgreSQL has nothing like that. Some forks, like YugabyteDB, provide real HA with most PostgreSQL features.

About the hype: many applications that run on PostgreSQL accept hours of downtime, planned or unplanned. Those who run larger, more critical applications on PostgreSQL are big companies with many expert DBAs who can handle the complexity of database automation. And use logical replication for upgrades. But no solution offers both low operational complexity and high availability that can be comparable to MongoDB

franckpachot|2 months ago

Beyond the hype, the PostgreSQL community is aware of the lack of "batteries-included" HA. This discussion on the idea of a Built-in Raft replication mentions MongoDB as:

>> "God Send". Everything just worked. Replication was as reliable as one could imagine. It outlives several hardware incidents without manual intervention. It allowed cluster maintenance (software and hardware upgrades) without application downtime. I really dream PostgreSQL will be as reliable as MongoDB without need of external services.

https://www.postgresql.org/message-id/0e01fb4d-f8ea-4ca9-8c9...

abrookewood|2 months ago

"I really dream PostgreSQL will be as reliable as MongoDB" ... someone needs to go and read up on Mongo's history!

Sure, the PostrgreSQL HA story isn't what we all want it to be, but the reliability is exceptional.

mfalcao|2 months ago

The most common way to achieve HÁ is using Patroni. The easiest way to set it up is using Autobase (https://autobase.tech).

CloudNativePG (https://cloudnative-pg.io) is a great option if you’re using Kubernetes.

There’s also pg_auto_failover which is a Postgres extension and a bit less complex than the alternatives, but it has its drawbacks.

tresil|2 months ago

If you’re running Kubernetes, CloudNativePG seems to be the “batteries included” HA Postgres cluster that’s becoming the standard in this area.

franckpachot|2 months ago

CloudNativePG is automation around PostgreSQL, not "batteries included", and not the idea of Kubernetes where pods can die or spawn without impacting the availability. Unfortunately, naming it Cloud Native doesn't transform a monolithic database to an elastic cluster

monus|2 months ago

We’ve recently had a disk failure in the primary and CloudNativePG promoted another to be primary but it wasn’t zero downtime. During transition, several queries failed. So something like pgBouncer together with transactional queries (no prepared statements) is still needed which has performance penalty.

jknoepfler|2 months ago

I use Patroni for that in a k8s environment (although it works anywhere). I get an off-the-shelf declarative deployment of an HA postgres cluster with automatic failover with a little boiler-plate YAML.

Patroni has been around for awhile. The database-as-a-service team where I work uses it under the hood. I used it to build database-as-a-service functionality on the infra platform team I was at prior to that.

It's basially push-button production PG.

There's at least one decent operator framework leveraging it, if that's your jam. I've been living and dying by self-hosting everything with k8s operators for about 6-7 years now.

tempest_|2 months ago

We use patroni and run it outside of k8s on prem, no issues in 6 or 7 years. Just upgraded from pg 12 to 17 with basically no down time without issue either.

wb14123|2 months ago

Yeah I'm also wondering that. I'm looking for self-host PostgreSQL after Cockroach changed their free tier license but found the HA part of PostgreSQL is really lacking. I tested Patroni which seems to be a popular choice but found some pretty critical problems (https://www.binwang.me/2024-12-02-PostgreSQL-High-Availabili...). I tried to explore some other solutions, but found out the lack of a high level design really makes the HA for PostgreSQL really hard if not impossible. For example, without the necessary information in WAL, it's hard to enforce primary node even with an external Raft/Paxos coordinator. I wrote some of them down in this blog (https://www.binwang.me/2025-08-13-Why-Consensus-Shortcuts-Fa...) especially in the section "Highly Available PostgreSQL Cluster" and "Quorum".

My theory of why Postgres is still getting the hype is either people don't know the problem, or it's acceptable on some level. I've worked in a team that maintains the in house database cluster (even though we were using MySQL instead of PostgreSQL) and the HA story was pretty bad. But there were engineers manually recover the data lost and resolve data conflicts, either from the recovery of incident or from customer tickets. So I guess that's one way of doing business.

forinti|2 months ago

I love Postgresql simply because it never gives me any trouble. I've been running it for decades without trouble.

OTOH, Oracle takes most of my time with endless issues, bugs, unexpected feature modifications, even on OCI!

dpedu|2 months ago

This is my gripe with Postgres as well. Every time I see comments extolling the greatness of Postgres, I can't help but think "ah, that's a user, not a system administrator" and I think that's a completely fair judgement. Postgres is pretty great if you don't have to take care of it.

forinti|2 months ago

I manage Postgresql and the thing I really love about it is that there's not much no manage. It just works. Even setting up streaming replication is really easy.

christophilus|2 months ago

I’ve been tempted by MariaDB for this reason. I’d love to hear from anyone who has run both.

paulryanrogers|2 months ago

IMO Maria has fallen behind MySQL. I wouldn't chose it for anything my income depends on.

(I do use Maria at home for legacy reasons, and have used MySQL and Pg professionally for years.)

dangoodmanUT|2 months ago

Patroni, Zolando operator on k8s

groundzeros2015|2 months ago

Because that’s an expensive and complex boondoggle almost no business needs.

paulryanrogers|2 months ago

RDS provides some HA. HAProxy or PGBouncer can help when self hosting.

notaseojh|2 months ago

it's easy to through names out like this (pgbackrest is also useful...) but getting them setup properly in a production environment is not at all straightforward, which I think is the point.