The PostgreSQL data directory format is not very stable or portable. You can't just ZIP it up and move it to a different machine, unless the new machine has the same architecture and "sufficiently" similar PostgreSQL binaries.
In theory the data directory works with any PostgreSQL binaries from the same major version of PostgreSQL, but I have seen cases where this fails e.g. because the binaries were from the same major version but compiled with different build options.
I would never ever zip up a PostgreSQL data directory and expect it to restore elsewhere. I would use a proper export. If the export is too slow, it could help to use streaming replication to write intermediate files which can be moved to a backup location.
Even with SQLite, for a simple file copy to work reliably, one has to set these three:
Yeah, I was going to mention, just upgrading between PG versions can be a bit of a pain. Dump/Restore really seems like a less than stellar option of you have a LOT of data. I mean you can stream through gzip/bzip to save space but still.
I often wish that Firebird had a license that people found friendlier to use as it always felt like a perfect technical option for many use cases from embedded to a standalone server. PG has clearly eclipsed it at this point though.
I wonder whether packaging everything in Docker (including a specific Postgres container identified by hash or whatever) and deploying on the same architecture would solve this?
Some apps do, the most used I know of is Blackmagic's Davinci Resolve, the video editor with a relatively full featured free edition available. I think this has more to do with its roots being in a high end networked environment but still, the local desktop version installs Postgres.
Oh, interesting! But that's more of a desktop application now, right? I was thinking of web servers when writing the article, but I can see how that's not totally clear. :-)
You got my hopes up, but it's WASM for now, not something I could add into a golang [1] or python app and have running like sqlite. OK, still hoping...!
For heavy duty production use (i.e., pushing the actual HW limits), I would feel more comfortable with the SQLite vertical. Unix sockets are fast, but you are still going across process boundaries. All of SQLite can be ran on the same thread as the rest of the application. This can dramatically reduce consumption of memory bandwidth, etc.
Memory bandwidth I don't worry about much - most of the time you should settup a small database with just enough data for that test, which hopefully is fast. However sockets and processes are a worry as starting as there are places things can go wrong not related to your test and then you have flakely tests nobody trusts.
The huge advantage of SQLite is not that it's on the same machine, but it's that is it in process which makes deployment and everything else just simpler.
I've been using postgres as a local database for one of my personal projects, a GUI app or to run python tests that depend on it without having to rely on installing it in my environment.
I created a Python package that downloads an embedded Postgres binary, sets up the database, and gives you a database URL: https://github.com/kketch/tinypg
When not using python, been using this script to create ephemeral postgres databases for tests but also persistent one in my dev containers: https://github.com/eradman/ephemeralpg
I've wrapped it with yet another shell script to make it usable just like this:
I do this using the Docker approach, especially for low scale web apps that run on a single VM. I like that its full Postgres versus the sometimes odd limits of SQLite. My usual approach uses a Trafik container for SSL, Django+gunicorn web app, and Postgres container; all running as containers one VM. Postgres uses a volume, which I back up regularly. For testing I use `eatmydata` which turns off sync, and speeds up test cycles by a couple percent.
I haven't tried the unix socket approach, I suppose I should try, but it's plenty performant as is. One project I built using this model hit the HN front page. Importantly, the "marketing page" was static content on a CDN, so the web app only saw users who signed up.
I don’t recall the mechanics but I do know that folks have bundled starting a local instance of PG solely for unit tests.
There’s a pre-install step to get PG on the machine, but once there, the testing framework stands it up, and shuts it down. As I recall it was pretty fast.
The most annoying part of it is that Postgres absolutely detests running as PID 0, which makes running `make check` in Docker containers an unnecessarily frustrating experience. I understand why Postgres rejects PID 0 by default, but I would really like for them to recognize strcmp(getenv("I_AM_IN_DOCKER_AND_I_DONT_CARE"), "true") or something.
SQLite is used more in resource-constrained environments. If I had much memory to waste, I would've used a local PostgreSQL installation to begin with.
Note that advanced vector-embedding querying techniques for approximate nearest neighbor search inevitably always need an absurd amount of memory, so it typically doesn't make sense to use them in a resource-constrained environment to begin with.
If I am not mistaken, DuckDB is suitable for columnar analytics queries, less so for multi-column row extractions. Which PG-like functionality does it offer on top?
No it’s not ”pg functionality”. It’s close to SQL standard compliance but not close to what Postgres has to offer. Also, single transaction writing at a time, in-process etc.
> You can just install Postgres on your single big and beefy application server (because there’s just the one when you use SQLite, scaled vertically), and run your application right next to it.
In the cloud, as you probably know, the usual way now is to spin up Postgres separately (RDS, Supabase, Planetscale, Crunchy Bridge, you name it). We've gotten so used to it that a different way of doing it is often not even considered.
But I think tooling has come a long way, and there have been a lot of learnings from the cloud, so it's time to swing the pendulum back and reconsider assumptions!
You'd still be incurring client/server IPC for individual database queries, which would cost you one of the benefits of using SQLite (IDGAF-queries that just hammer tables repeatedly because there's almost no cost to doing so).
I upvoted this, because while it was critical it didn't feel meanspirited and it was factually correct.
After fully reading the article I came to understand it really was not referring to anything sqllite specific, was really 'what if you ran postgres as an application on a server', there really is nothing more to be gained from reading the article beyond this, and this is kind of the most basic deployment model for postgres for like the last 40 years.
[+] [-] eirikbakke|5 months ago|reply
In theory the data directory works with any PostgreSQL binaries from the same major version of PostgreSQL, but I have seen cases where this fails e.g. because the binaries were from the same major version but compiled with different build options.
[+] [-] OutOfHere|5 months ago|reply
Even with SQLite, for a simple file copy to work reliably, one has to set these three:
[+] [-] tracker1|5 months ago|reply
I often wish that Firebird had a license that people found friendlier to use as it always felt like a perfect technical option for many use cases from embedded to a standalone server. PG has clearly eclipsed it at this point though.
[+] [-] markusw|5 months ago|reply
I wonder whether packaging everything in Docker (including a specific Postgres container identified by hash or whatever) and deploying on the same architecture would solve this?
[+] [-] oulipo2|5 months ago|reply
[+] [-] freedomben|5 months ago|reply
[+] [-] mutagen|5 months ago|reply
[+] [-] markusw|5 months ago|reply
[+] [-] emschwartz|5 months ago|reply
[+] [-] tensor|5 months ago|reply
https://github.com/electric-sql/pglite-bindings
It would still be missing the standardized disk format aspect of sqlite, but otherwise will be neat.
[+] [-] thruflo|5 months ago|reply
It's an embeddable Postgres you can run in process as a local client DB, just like SQLite but it's actually Postgres.
[+] [-] xarope|5 months ago|reply
[1]https://github.com/electric-sql/pglite/issues/89
[+] [-] markusw|5 months ago|reply
[+] [-] bob1029|5 months ago|reply
https://learn.microsoft.com/en-us/sql/database-engine/config...
For heavy duty production use (i.e., pushing the actual HW limits), I would feel more comfortable with the SQLite vertical. Unix sockets are fast, but you are still going across process boundaries. All of SQLite can be ran on the same thread as the rest of the application. This can dramatically reduce consumption of memory bandwidth, etc.
[+] [-] bluGill|5 months ago|reply
[+] [-] markusw|5 months ago|reply
[+] [-] srameshc|5 months ago|reply
[+] [-] markusw|5 months ago|reply
I'm curious, when do you want to treat your Postgres like SQLite? :-) That's basically the opposite of what I was thinking of in the article.
[+] [-] oulipo2|5 months ago|reply
[+] [-] marcobambini|5 months ago|reply
[+] [-] markusw|5 months ago|reply
[+] [-] kissgyorgy|5 months ago|reply
[+] [-] markusw|5 months ago|reply
[+] [-] kketch|5 months ago|reply
I created a Python package that downloads an embedded Postgres binary, sets up the database, and gives you a database URL: https://github.com/kketch/tinypg
It downloads pg binaries from this project: https://github.com/zonkyio/embedded-postgres-binaries. There are other similar projects listed on that page that provide this for Java, Go, Rust and Node
[+] [-] kketch|5 months ago|reply
I've wrapped it with yet another shell script to make it usable just like this:
`export DB_URL=$(./pgtest.sh)`
This version (pgtest.sh), just creates a disposable ephemeral postgres: https://gist.github.com/kketch/d4e19a7fb6ebc1cfc265af44c1b41...
This version (pgdev.sh), starts a postgres instance (if not already running) and persists the DB. Also supports seeding it with a SQL script (dev_seed.sql): https://gist.github.com/kketch/88bb5b766994e247a9f2c37f13306...
[+] [-] nu11ptr|5 months ago|reply
https://github.com/fergusstrange/embedded-postgres
[+] [-] OutOfHere|5 months ago|reply
With SQLite, although all approaches are available, my fav is to use https://github.com/ncruces/go-sqlite3 which uses Wazero.
I try to avoid CGO if I can because it adds compile-time complexity, making it unfriendly for a user to compile.
[+] [-] tptacek|5 months ago|reply
[+] [-] 8organicbits|5 months ago|reply
I haven't tried the unix socket approach, I suppose I should try, but it's plenty performant as is. One project I built using this model hit the HN front page. Importantly, the "marketing page" was static content on a CDN, so the web app only saw users who signed up.
[+] [-] markusw|5 months ago|reply
So you do periodic backups, not incremental on every write or something (read replica-like)?
It's important to me to not lose any data once committed if at all possible.
(For testing, I've sped everything up by running migrations on `template1` and every test gets a random database name. Works wonders.)
[+] [-] whartung|5 months ago|reply
There’s a pre-install step to get PG on the machine, but once there, the testing framework stands it up, and shuts it down. As I recall it was pretty fast.
[+] [-] majewsky|5 months ago|reply
The most annoying part of it is that Postgres absolutely detests running as PID 0, which makes running `make check` in Docker containers an unnecessarily frustrating experience. I understand why Postgres rejects PID 0 by default, but I would really like for them to recognize strcmp(getenv("I_AM_IN_DOCKER_AND_I_DONT_CARE"), "true") or something.
[+] [-] nullzzz|5 months ago|reply
[+] [-] OutOfHere|5 months ago|reply
Note that advanced vector-embedding querying techniques for approximate nearest neighbor search inevitably always need an absurd amount of memory, so it typically doesn't make sense to use them in a resource-constrained environment to begin with.
[+] [-] markusw|5 months ago|reply
[+] [-] yndoendo|5 months ago|reply
Next embedded model version moved to 8MB flash and 64MB RAM with the same software infrastructure.
[+] [-] munchlax|5 months ago|reply
[+] [-] OutOfHere|5 months ago|reply
[+] [-] nullzzz|5 months ago|reply
[+] [-] JodieBenitez|5 months ago|reply
Am I getting old ? Seems obvious to me.
[+] [-] BiteCode_dev|5 months ago|reply
I meet an unexpected number of young devs that:
- Find hypermedia to be exotic. JSON web API is the only way that makes sense to them.
- Say they use the cloud to "save money".
- Use third-party services for things as basic as authentication because "it's hard".
- Encode security stuff in the frontend code.
It's the cycle of life.
[+] [-] omarqureshi|5 months ago|reply
[+] [-] markusw|5 months ago|reply
In the cloud, as you probably know, the usual way now is to spin up Postgres separately (RDS, Supabase, Planetscale, Crunchy Bridge, you name it). We've gotten so used to it that a different way of doing it is often not even considered.
But I think tooling has come a long way, and there have been a lot of learnings from the cloud, so it's time to swing the pendulum back and reconsider assumptions!
[+] [-] reactordev|5 months ago|reply
https://electric-sql.com/
[+] [-] tptacek|5 months ago|reply
[+] [-] nullzzz|5 months ago|reply
[+] [-] apalmer|5 months ago|reply
After fully reading the article I came to understand it really was not referring to anything sqllite specific, was really 'what if you ran postgres as an application on a server', there really is nothing more to be gained from reading the article beyond this, and this is kind of the most basic deployment model for postgres for like the last 40 years.
[+] [-] markusw|5 months ago|reply