Main difference from PG18's approach: you get complete server isolation (useful for testing migrations, different PG configs, etc.) rather than databases sharing one instance.
Despite all of the complaints in other comments about the use of Claude Code, it looks interesting and I appreciated the video demo you put on the GitHub page.
thanks for sharing its interesting approach. I am not sure why people are complaining most of the software is written with the help of agents these days.
I set this up for my employer many years ago when they migrated to RDS. We kept bumping into issues on production migrations that would wreck things. I decided to do something about it.
The steps were basically:
1. Clone the AWS RDS db - or spin up a new instance from a fresh backup.
2. Get the arn and from that the cname or public IP.
3. Plug that into the DB connection in your app
4. Run the migration on pseudo prod.
This helped up catch many bugs that were specific to production db or data quirks and would never haven been caught locally or even in CI.
Then I created a simple ruby script to automate the above and threw it into our integrity checks before any deployment. Last I heard they were still using that script I wrote in 2016!
I love those "migration only fails in prod because of data quirks" bugs. They are the freaking worst. Have called off releases in the past because of it.
Really interesting article, I didn't know that the template cloning strategy was configurable. Huge fan of template cloning in general; I've used Neon to do it for "live" integration environments, and I have a golang project https://github.com/peterldowns/pgtestdb that uses templates to give you ~unit-test-speed integration tests that each get their own fully-schema-migrated Postgres database.
Back in the day (2013?) I worked at a startup where the resident Linux guru had set up "instant" staging environment databases with btrfs. Really cool to see the same idea show up over and over with slightly different implementations. Speed and ease of cloning/testing is a real advantage for Postgres and Sqlite, I wish it were possible to do similar things with Clickhouse, Mysql, etc.
For anyone looking for a simple GUI for local testing/development of Postgres based applications. I built a tool a few years ago that simplifies the process: https://github.com/BenjaminFaal/pgtt
Is this basically using templates as "snapshots", and making it easy to go back and forth between them? Little hard to tell from the README but something like that would be useful to me and my team: right now it's a pain to iterate on sql migrations, and I think this would help.
Uff, I had no idea that Postgres v15 introduced WAL_LOG and changed the defaults from FILE_COPY. For (parallel CI) test envs, it make so much sense to switch back to the FILE_COPY strategy ... and I previously actually relied on that behavior.
In theory, a database that uses immutable data structures (the hash array mapped trie popularized by Clojure) could allow instant clones on any filesystem, not just ZFS/XFS, and allow instant clones of any subset of the data, not just the entire db. I say "in theory" but I actually built this already so it's not just a theory. I never understood why there aren't more HAMT based databases.
Does datomic have built in cloning functionality? I’ve been wanting to try datomic out but haven’t felt like putting in the work to make a real app lol
Any non-trivial amount of data and you’ll run into non-trivial problems.
For example, some of our pg databases got into such state, that we had to write custom migration tool because we couldn’t copy data to new instance using standard tools. We had to re-write schema to using custom partitions because perf on built-in partitioning degrades as number of partitions gets high, and so on.
Once upon a time, MySQL/InnoDB was a better performance choice for UPDATE-heavy workloads. There was a somewhat famous blog post about this from Uber[1]. I'm not sure to what extent this persists today. The other big competitor is sqlite3, which fills a totally different niche for running databases on the edge and in-product.
Personally, I wouldn't use any SQL DB other that PostgreSQL for the typical "database in the cloud" use case, but I have years of experience both developing for and administering production PostgreSQL DBs, going back to 9.5 days at least. It has its warts, but I've grown to trust and understand it.
To be fair, postgres still suffers from a poor choice of MVCC implementation (copy on write rather than an undo log). This one small choice has a huge number of negative knock on effects once your load becomes non-trivial
As an aside, I just jumped around and read a few articles. This entire blog looks excellent. I’m going to have to spend some time reading it. I didn’t know about Postgres’s range types.
Assuming I'd like to replicate my production database for either staging, or to test migrations, etc,
and that most of my data is either:
- business entities (users, projects, etc)
- and "event data" (sent by devices, etc)
where most of the database size is in the latter category, and that I'm fine with "subsetting" those (eg getting only the last month's "event data")
what would be the best strategy to create a kind of "staging clone"? ideally I'd like to tell the database (logically, without locking it expressly): do as though my next operations only apply to items created/updated BEFORE "currentTimestamp", and then:
- copy all my business tables (any update to those after currentTimestamp would be ignored magically even if they happen during the copy)
- copy a subset of my event data (same constraint)
Aurora clones are copy-on-write at the storage layer, which solves part of the problem, but RDS still provisions you a new cluster with its own endpoints, etc, which is slow ~10 mins, so not really practical for the integration testing use case.
Is anyone aware of something like this for MariaDB?
Something we've been trying to solve for a long time is having instant DB resets between acceptance tests (in CI or locally) back to our known fixture state, but right now it takes decently long (like half a second to a couple seconds, I haven't benchmarked it in a while) and that's by far the slowest thing in our tests.
I just want fast snapshotted resets/rewinds to a known DB state, but I need to be using MariaDB since it's what we use in production, we can't switch DB tech at this stage of the project, even though Postgres' grass looks greener.
I was able to accomplish this by doing each test within its own transaction session that gets rolled-back after each test. This way I'm allowed to modify the database to suit my needs for each test, then it gets magically reset back to its known state for the next test. Transaction rollbacks are very quick.
This is really cool and I love to see the interest in fast clones / branching here.
We've built Xata with this idea of using copy-on-write database branching for staging and testing setups, where you need to use testing data that's close to the real data. On top of just branching, we also do things like anonymization and scale-to-zero, so the dev branches are often really cheap. Check it out at https://xata.io/
> The source database can't have any active connections during cloning. This is a PostgreSQL limitation, not a filesystem one. For production use, this usually means you create a dedicated template database rather than cloning your live database directly.
This is a key limitation to be aware of. A way to workaround it could be to use pgstream (https://github.com/xataio/pgstream) to copy from the production database to a production replica. Pgstream can also do anonymization on the way, this is what we use at Xata.
OP here - yes, this is my use case too: integration and regression testing, as well as providing learning environments. It makes working with larger datasets a breeze.
We do this, preview deploys, and migration dry runs using Neon Postgres’s branching functionality - seems one benefit of that vs this is that it works even with active connections which is good for doing these things on live databases.
This is really cool, looking forward to trying it out.
Obligatory mention of Neon (https://neon.com/) and Xata (https://xata.io/) which both support “instant” Postgres DB branching on Postgres versions prior to 18.
we just build the database, commit it to a container (without volumes attached), and programmatically stop and restart the container per test class (testcontainers.org). the overhead is < 5 seconds and our application recovers to the reset database state seamlessly. it's been awesome.
elitan|2 months ago
Works with any PG version today. Each branch is a fully isolated PostgreSQL container with its own port. ~2-5 seconds for a 100GB database.
https://github.com/elitan/velo
Main difference from PG18's approach: you get complete server isolation (useful for testing migrations, different PG configs, etc.) rather than databases sharing one instance.
72deluxe|2 months ago
newusertoday|2 months ago
whalesalad|2 months ago
anonzzzies|2 months ago
tobase|2 months ago
[deleted]
Rovanion|2 months ago
teiferer|2 months ago
Mind you, I'm not saying it's bad per se. But shouldn't we be open and honest about this?
I wonder if this is the new normal. Somebody says "I built Xyz" but then you realize it's vibe coded.
sheepscreek|2 months ago
The steps were basically:
1. Clone the AWS RDS db - or spin up a new instance from a fresh backup.
2. Get the arn and from that the cname or public IP.
3. Plug that into the DB connection in your app
4. Run the migration on pseudo prod.
This helped up catch many bugs that were specific to production db or data quirks and would never haven been caught locally or even in CI.
Then I created a simple ruby script to automate the above and threw it into our integrity checks before any deployment. Last I heard they were still using that script I wrote in 2016!
Tostino|2 months ago
peterldowns|2 months ago
Back in the day (2013?) I worked at a startup where the resident Linux guru had set up "instant" staging environment databases with btrfs. Really cool to see the same idea show up over and over with slightly different implementations. Speed and ease of cloning/testing is a real advantage for Postgres and Sqlite, I wish it were possible to do similar things with Clickhouse, Mysql, etc.
BenjaminFaal|2 months ago
peterldowns|2 months ago
okigan|2 months ago
Also docker link seems to be broken.
majodev|2 months ago
Raised an issue in my previous pet project for doing concurrent integration tests with real PostgreSQL DBs (https://github.com/allaboutapps/integresql) as well.
radarroark|2 months ago
zX41ZdbW|2 months ago
chamomeal|2 months ago
riskable|2 months ago
I'm wondering why anyone would want to use anything else at this point (for SQL).
nine_k|2 months ago
* MySQL has a much easier story of master-master replication.
* Mongo has a much easier story of geographic distribution and sharding. (I know that Citus exists, and has used it.)
* No matter how you tune Postgres, columnar databases like Clickhouse are still faster for analytics / time series.
* Write-heavy applications still may benefit from something like Cassandra, or more modern solutions in this space.
(I bet Oracle has something to offer in the department of cluster performance, too, but I did not check it out for a long time.)
vl|2 months ago
Any non-trivial amount of data and you’ll run into non-trivial problems.
For example, some of our pg databases got into such state, that we had to write custom migration tool because we couldn’t copy data to new instance using standard tools. We had to re-write schema to using custom partitions because perf on built-in partitioning degrades as number of partitions gets high, and so on.
aftbit|2 months ago
Personally, I wouldn't use any SQL DB other that PostgreSQL for the typical "database in the cloud" use case, but I have years of experience both developing for and administering production PostgreSQL DBs, going back to 9.5 days at least. It has its warts, but I've grown to trust and understand it.
1: https://www.uber.com/blog/postgres-to-mysql-migration/
turtles3|2 months ago
hu3|2 months ago
efxhoy|2 months ago
scottyah|2 months ago
wahnfrieden|2 months ago
christophilus|2 months ago
pak9rabid|2 months ago
oulipo2|2 months ago
and that most of my data is either:
- business entities (users, projects, etc)
- and "event data" (sent by devices, etc)
where most of the database size is in the latter category, and that I'm fine with "subsetting" those (eg getting only the last month's "event data")
what would be the best strategy to create a kind of "staging clone"? ideally I'd like to tell the database (logically, without locking it expressly): do as though my next operations only apply to items created/updated BEFORE "currentTimestamp", and then:
- copy all my business tables (any update to those after currentTimestamp would be ignored magically even if they happen during the copy) - copy a subset of my event data (same constraint)
what's the best way to do this?
gavinray|2 months ago
Something like:
https://www.postgresql.org/docs/current/sql-copy.htmlIt'd be really nice if pg_dump had a "data sample"/"data subset" option but unfortunately nothing like that is built in that I know of.
1f97|2 months ago
horse666|2 months ago
nateroling|2 months ago
francislavoie|2 months ago
Something we've been trying to solve for a long time is having instant DB resets between acceptance tests (in CI or locally) back to our known fixture state, but right now it takes decently long (like half a second to a couple seconds, I haven't benchmarked it in a while) and that's by far the slowest thing in our tests.
I just want fast snapshotted resets/rewinds to a known DB state, but I need to be using MariaDB since it's what we use in production, we can't switch DB tech at this stage of the project, even though Postgres' grass looks greener.
proaralyst|2 months ago
pak9rabid|2 months ago
tudorg|2 months ago
We've built Xata with this idea of using copy-on-write database branching for staging and testing setups, where you need to use testing data that's close to the real data. On top of just branching, we also do things like anonymization and scale-to-zero, so the dev branches are often really cheap. Check it out at https://xata.io/
> The source database can't have any active connections during cloning. This is a PostgreSQL limitation, not a filesystem one. For production use, this usually means you create a dedicated template database rather than cloning your live database directly.
This is a key limitation to be aware of. A way to workaround it could be to use pgstream (https://github.com/xataio/pgstream) to copy from the production database to a production replica. Pgstream can also do anonymization on the way, this is what we use at Xata.
mvcosta91|2 months ago
radimm|2 months ago
presentation|2 months ago
drakyoko|2 months ago
unknown|2 months ago
[deleted]
TimH|2 months ago
horse666|2 months ago
Obligatory mention of Neon (https://neon.com/) and Xata (https://xata.io/) which both support “instant” Postgres DB branching on Postgres versions prior to 18.
eatsyourtacos|2 months ago
1a527dd5|2 months ago
wayeq|2 months ago
hmokiguess|2 months ago
tehlike|2 months ago
702318464|2 months ago
[deleted]