Another amazing release, congrats to all the contributors. There are simply too many things to call out - just a few highlights:
Massive improvements to vacuum operations:
"PostgreSQL 17 introduces a new internal memory structure for vacuum that consumes up to 20x less memory."
Much needed features for backups:
"pg_basebackup, the backup utility included in PostgreSQL, now supports incremental backups and adds the pg_combinebackup utility to reconstruct a full backup"
I'm a huge fan of FDW's and think they are an untapped-gem in Postgres, so I love seeing these improvements:
"The PostgreSQL foreign data wrapper (postgres_fdw), used to execute queries on remote PostgreSQL instances, can now push EXISTS and IN subqueries to the remote server for more efficient processing."
A bit off topic, can someone suggest how I can learn more about using databases(postgres specifically) in real world applications? I am familiar with SQL and common ORMs, but I feel the internet is full of beginner level tutorials which lack this depth.
Very cool with the JSON_TABLE. The style of putting json response (from API, created from scraping, ect.) into jsonb column and then writing a view on top to parse / flatten is something I've been doing for a few years now. I've found it really great to put the json into a table, somewhere safe, and then do the parsing rather than dealing with possible errors on the scripting language side. I haven't seen this style been used in other places before, and to see it in the docs as a feature from new postgres makes me feel a bit more sane. Will be cool to try this out and see the differences from what I was doing before!
A personal rule of mine is to always separate data receipt+storage from parsing. The retrieval is comparatively very expensive and has few possible failure modes. Parsing can always fail in new and exciting ways.
Disk space to store the returned data is cheap and can be periodically flushed only when you are certain the content has been properly extracted.
A number of features stood out to me in this release:
1. Chipping away more at vacuum. Fundamentally Postgres doesn't have undo log and therefore has to have vacuum. It's a trade-off of fast recovery vs well.. having to vacuum. The unfortunate part about vacuum is that it adds load to the system exactly when the system needs all the resources. I hope one day people stop knowing that vacuum exists, we are one step closer, but not there.
2. Performance gets better and not worse. Mark Callaghan blogs about MySQL and Postgres performance changes over time and MySQL keep regressing performance while Postgres keeps improving.
3. JSON. Postgres keep improving QOL for the interop with JS and TS.
4. Logical replication is becoming a super robust way of moving data in and out. This is very useful when you move data from one instance to another especially if version numbers don't match. Recently we have been using it to move at the speed of 1Gb/s
5. Optimizer. The better the optimizer the less you think about the optimizer. According to the research community SQL Server has the best optimizer. It's very encouraging that every release PG Optimizer gets better.
MySQL can be faster in certain circumstances (mostly range selects), but only if your schema and queries are designed to exploit InnoDB’s clustering index.
But even then, in some recent tests I did, Postgres was less than 0.1 msec slower. And if the schema and queries were not designed with InnoDB in mind, Postgres had little to no performance regression, whereas MySQL had a 100x slowdown.
I love MySQL for a variety of reasons, but it’s getting harder for me to continue to defend it.
I ran into a lot of that 20 years ago, surprised to hear it's still a thing at all given how it's basically common knowledge that most of the Internet and Cloud run on open source software.
I once met an older gentleman who was doing IT work for a defense contractor. He seemed nice enough. We were making small talk and I happened to mention that I had recently installed Linux on my computer at home. He tone changed almost immediately and he started ranting about how Linux was pirated source code, stolen from Microsoft, all of it contains viruses, etc. He was talking about the SCO vs Linux lawsuits but of course got absolutely ALL of the details wrong, like which companies were even involved in the lawsuits. He was so far off the deep end that I didn't even try to correct him, I just nodded and smiled and said I was actually running late to be somewhere else...
What your boss doesn't realize is your business already depends on FOSS. Here are a few examples:
- Do you use any cloud provider? Those platforms are built on top of open source software: Linux, nginx (e.g Cloudflare's edge servers before the rust rewrite), ha-proxy (AWS ELB), etc
- Either the software your business builds or depends on probably uses open source libraries (e.g: libc, etc)
- The programming languages your business uses directly or indirectly are probably open source
My point is that folks that make these kinds of statements have no clue how their software is built or what kind software their business actually depends on.
Well, you can't necessarily trust open source for business software.
The more deeply your business depends on something, the more careful you need to be when selecting the source for that something. (And the db is often very deeply depended on.)
You want to see why their long-term incentives align with your own needs.
But a revenue stream is just one way to do this, and not a perfect one. (Case in point: Oracle.)
In my experience, SQL Server isn't bad though. I know a couple commercial products that started with SQL Server in the late '90s and remain happy with it now. The cost hasn't been a problem and they like the support and evolution of the product. They find they can adopt newer versions and features when they need to without too much pain/cost/time.
(Not that I don't think Postgres is generally a better green-field pick, though, and even more so porting away from Oracle.)
Wow, yea, the performance gains and new UX features (JSON_TABLE, MERGE improvements, etc) are huge here, but these really stand out to me:
> PostgreSQL 17 supports using identity columns and exclusion constraints on partitioned tables.
> PostgreSQL 17 also includes a built-in, platform independent, immutable collation provider that's guaranteed to be immutable and provides similar sorting semantics to the C collation except with UTF-8 encoding rather than SQL_ASCII. Using this new collation provider guarantees that your text-based queries will return the same sorted results regardless of where you run PostgreSQL.
Amazing release, Postgres is a gift that keeps on giving.
I hope to one day see Incremental View Maintenance extension (IVM) be turned into a first class feature, it's the only thing I need regularly which isn't immediately there!
It's never available on homebrew the same day so we all worked hard to make it available the same day on Neon. If you want to try out the JSON_TABLE and MERGE RETURNING features you can spin up a free instance quickly.
There was LAMP and then MERN and MEAN etc. and then there was Postgres.
Its not quite visible yet, but all this progres by postgres (excuse the pun) on making JSON more deeply integrated with relational principles will surely at some point enable a new paradigm, at least for full stack web frameworks?
The problem with postgres's JSON support is it leads to last write win race conditions compared to actual document stores. Once they can support operations like incrementing a number while someone else updates another field, without locking the row, then maybe.
I don't think you'll see a paradigm shift. You'll just see people using other documents stores less.
We’re already in a new paradigm, one in which web devs are abandoning referential integrity guarantees in favor of not having to think about data modeling.
I’d say they’re then surprised when their DB calls are slow (nothing to do with referential integrity, just the TOAST/DETOAST overhead), but since they also haven’t experienced how fast a DB with local disks and good data locality can be, they have no idea.
Wow, it finally has failover support for logical replication slots. That was biggest reason why I couldn't depend on logical replication, since the master DB failover handling was too complex for me to deal with.
Loving the continued push for JSON features. I'm going to get a lot of use out of JSON_TABLE. And json_scalar & json_serialize are going to be helpful at times too. JSON_QUERY with OMIT QUOTES is awesome too for some things.
I hope SQLite3 can implement SQL/JSON soon too. I have a library of compatability functions to generate the appropriate JSON operations depending on if it's SQLite3 or PostgreSQL. And it'd be nice to reduce the number of incompatibilities over time.
But, there's a ton of stuff in the release notes that jumped out at me too:
"COPY .. ON_ERROR" ignore is going to be nice for loading data anywhere that you don't care if you get all of it. Like a dev environment or for just exploring something. [1]
Improvements to CTE plans are always welcome. [2]
"transaction_timeout" is an amazing addition to the existing "statement_timeout" as someone who has to keep an eye on less experienced people running SQL for analytics / intelligence. [3]
There's a function to get the timestamp out of a UUID easily now, too: uuid_extract_timestamp(). This previously required a user defined function. So it's another streamlining thing that's nice. [4]
I'll use the new "--exclude-extension" option for pg_dump, too. I just got bitten by that when moving a database. [5]
"Allow unaccent character translation rules to contain whitespace and quotes". Wow. I needed this! [6]
> I hope SQLite3 can implement SQL/JSON soon too. I have a library of compatability functions to generate the appropriate JSON operations depending on if it's SQLite3 or PostgreSQL. And it'd be nice to reduce the number of incompatibilities over time.
I wish postgres supports parquet file imports and exports. COPY command with csv is really slooooooooow. Even BINARY is quite slow and bandwidth heavy.
I wonder how open postgres is and what kind of pull requests postgres team considers? I'd like to learn how to contribute to PG in baby steps and eventually get to a place where I could contribute substantial features.
There has been a patch to extend the COPY code with pluggable APIs, adding callbacks at start, end, and for each row processed: https://commitfest.postgresql.org/49/4681/.
I'd guess that this may fit your purpose to add a custom format without having to fork upstream.
I’m not up to date on the recent changes, but problems we had with vacuum were more computation and iops related than memory related.
Basically in a database with a lot of creation/deletion, database activity can outrun the vacuum, leading to out of storage errors, lock contention, etc
In order to keep throughput up, we had to throttle things manually on the input side, to allow vacuum to complete. Otherwise throughput would eventually drop to zero.
No, vacuum issues are not solved. This will reduce the amount of scanning needed in many cases when vacuuming indexes. It will mean more efficient vacuum and quicker vacuums which will help in a lot of cases.
There was a big project to re-architect the low level storage system to something that isn't dependent on vacuuming, called zheap. Unfortunately it seems to have stalled and nobody seems to be working on it anymore? I keep scanning the release notes for each new pgsql version, but no dice.
This release is good on so many levels. Just the performance optimisations and the JSON TABLE feature could be entirely separated release, but we got so much more.
kiwicopple|1 year ago
Massive improvements to vacuum operations:
Much needed features for backups: I'm a huge fan of FDW's and think they are an untapped-gem in Postgres, so I love seeing these improvements:peiskos|1 year ago
brunoqc|1 year ago
I'm pretty sure I could read them when needed with fdw. Is it a good idea?
I think it can be slow but maybe I could use materialized views or something.
ellisv|1 year ago
Do you have any recommendations on how to manage credentials for `CREATE USER MAPPING ` within the context of cloud hosted dbs?
unknown|1 year ago
[deleted]
netcraft|1 year ago
It is so nice having json functionality in a relational db - even if you never actually store json in your database, its useful in so many situations.
Being able to generate json in a query from your data is a big deal too.
Looking forward to really learning json_table
jackschultz|1 year ago
0cf8612b2e1e|1 year ago
Disk space to store the returned data is cheap and can be periodically flushed only when you are certain the content has been properly extracted.
abyesilyurt|1 year ago
That’s a very good idea!
ellisv|1 year ago
pestaa|1 year ago
Bit sad the UUIDv7 PR didn't make the cut just yet:
https://commitfest.postgresql.org/49/4388/
ellisv|1 year ago
0cf8612b2e1e|1 year ago
Where is the iceberg complexity?
nikita|1 year ago
1. Chipping away more at vacuum. Fundamentally Postgres doesn't have undo log and therefore has to have vacuum. It's a trade-off of fast recovery vs well.. having to vacuum. The unfortunate part about vacuum is that it adds load to the system exactly when the system needs all the resources. I hope one day people stop knowing that vacuum exists, we are one step closer, but not there.
2. Performance gets better and not worse. Mark Callaghan blogs about MySQL and Postgres performance changes over time and MySQL keep regressing performance while Postgres keeps improving.
https://x.com/MarkCallaghanDB https://smalldatum.blogspot.com/
3. JSON. Postgres keep improving QOL for the interop with JS and TS.
4. Logical replication is becoming a super robust way of moving data in and out. This is very useful when you move data from one instance to another especially if version numbers don't match. Recently we have been using it to move at the speed of 1Gb/s
5. Optimizer. The better the optimizer the less you think about the optimizer. According to the research community SQL Server has the best optimizer. It's very encouraging that every release PG Optimizer gets better.
sgarland|1 year ago
But even then, in some recent tests I did, Postgres was less than 0.1 msec slower. And if the schema and queries were not designed with InnoDB in mind, Postgres had little to no performance regression, whereas MySQL had a 100x slowdown.
I love MySQL for a variety of reasons, but it’s getting harder for me to continue to defend it.
on_the_train|1 year ago
bityard|1 year ago
I once met an older gentleman who was doing IT work for a defense contractor. He seemed nice enough. We were making small talk and I happened to mention that I had recently installed Linux on my computer at home. He tone changed almost immediately and he started ranting about how Linux was pirated source code, stolen from Microsoft, all of it contains viruses, etc. He was talking about the SCO vs Linux lawsuits but of course got absolutely ALL of the details wrong, like which companies were even involved in the lawsuits. He was so far off the deep end that I didn't even try to correct him, I just nodded and smiled and said I was actually running late to be somewhere else...
gigatexal|1 year ago
What’s the rationale? What do you gain?
throw351203910|1 year ago
- Do you use any cloud provider? Those platforms are built on top of open source software: Linux, nginx (e.g Cloudflare's edge servers before the rust rewrite), ha-proxy (AWS ELB), etc - Either the software your business builds or depends on probably uses open source libraries (e.g: libc, etc) - The programming languages your business uses directly or indirectly are probably open source
My point is that folks that make these kinds of statements have no clue how their software is built or what kind software their business actually depends on.
WuxiFingerHold|1 year ago
The part that your boss doesn't trust Postgres is hilarious, of course.
jmull|1 year ago
The more deeply your business depends on something, the more careful you need to be when selecting the source for that something. (And the db is often very deeply depended on.)
You want to see why their long-term incentives align with your own needs.
But a revenue stream is just one way to do this, and not a perfect one. (Case in point: Oracle.)
In my experience, SQL Server isn't bad though. I know a couple commercial products that started with SQL Server in the late '90s and remain happy with it now. The cost hasn't been a problem and they like the support and evolution of the product. They find they can adopt newer versions and features when they need to without too much pain/cost/time.
(Not that I don't think Postgres is generally a better green-field pick, though, and even more so porting away from Oracle.)
baq|1 year ago
Rican7|1 year ago
> PostgreSQL 17 supports using identity columns and exclusion constraints on partitioned tables.
> PostgreSQL 17 also includes a built-in, platform independent, immutable collation provider that's guaranteed to be immutable and provides similar sorting semantics to the C collation except with UTF-8 encoding rather than SQL_ASCII. Using this new collation provider guarantees that your text-based queries will return the same sorted results regardless of where you run PostgreSQL.
ktosobcy|1 year ago
dewey|1 year ago
ellisv|1 year ago
lpapez|1 year ago
I hope to one day see Incremental View Maintenance extension (IVM) be turned into a first class feature, it's the only thing I need regularly which isn't immediately there!
andreashansen|1 year ago
qianli_cs|1 year ago
Or you could use triggers to build one: https://hypirion.com/musings/implementing-system-versioned-t...
jeltz|1 year ago
yen223|1 year ago
majkinetor|1 year ago
clarkbw|1 year ago
SELECT random(1, 10) AS random_number;
clarkbw|1 year ago
https://neon.tech/blog/postgres-17
(note that not all extensions are available yet, that takes some time still)
openrisk|1 year ago
Its not quite visible yet, but all this progres by postgres (excuse the pun) on making JSON more deeply integrated with relational principles will surely at some point enable a new paradigm, at least for full stack web frameworks?
winrid|1 year ago
I don't think you'll see a paradigm shift. You'll just see people using other documents stores less.
sgarland|1 year ago
I’d say they’re then surprised when their DB calls are slow (nothing to do with referential integrity, just the TOAST/DETOAST overhead), but since they also haven’t experienced how fast a DB with local disks and good data locality can be, they have no idea.
switch007|1 year ago
lukaslalinsky|1 year ago
veggieroll|1 year ago
I hope SQLite3 can implement SQL/JSON soon too. I have a library of compatability functions to generate the appropriate JSON operations depending on if it's SQLite3 or PostgreSQL. And it'd be nice to reduce the number of incompatibilities over time.
But, there's a ton of stuff in the release notes that jumped out at me too:
"COPY .. ON_ERROR" ignore is going to be nice for loading data anywhere that you don't care if you get all of it. Like a dev environment or for just exploring something. [1]
Improvements to CTE plans are always welcome. [2]
"transaction_timeout" is an amazing addition to the existing "statement_timeout" as someone who has to keep an eye on less experienced people running SQL for analytics / intelligence. [3]
There's a function to get the timestamp out of a UUID easily now, too: uuid_extract_timestamp(). This previously required a user defined function. So it's another streamlining thing that's nice. [4]
I'll use the new "--exclude-extension" option for pg_dump, too. I just got bitten by that when moving a database. [5]
"Allow unaccent character translation rules to contain whitespace and quotes". Wow. I needed this! [6]
[1] https://www.postgresql.org/docs/17/release-17.html#RELEASE-1...
[2] https://www.postgresql.org/docs/17/release-17.html#RELEASE-1...
[3] https://www.postgresql.org/docs/17/release-17.html#RELEASE-1...
[4] https://www.postgresql.org/docs/17/release-17.html#RELEASE-1...
[5] https://www.postgresql.org/docs/17/release-17.html#RELEASE-1...
[6] https://www.postgresql.org/docs/17/release-17.html#RELEASE-1...
nbbaier|1 year ago
Is this available anywhere? Super interested
pbronez|1 year ago
https://www.postgresql.org/docs/17/functions-json.html#FUNCT...
SOLAR_FIELDS|1 year ago
https://repost.aws/questions/QUbIn2WmXgTbiVu_4wua9tUw/dms-wi...
nojvek|1 year ago
I wonder how open postgres is and what kind of pull requests postgres team considers? I'd like to learn how to contribute to PG in baby steps and eventually get to a place where I could contribute substantial features.
ioltas|1 year ago
I'd guess that this may fit your purpose to add a custom format without having to fork upstream.
ssfak|1 year ago
[1]. https://github.com/duckdb/pg_duckdb
ksec|1 year ago
ses1984|1 year ago
Basically in a database with a lot of creation/deletion, database activity can outrun the vacuum, leading to out of storage errors, lock contention, etc
In order to keep throughput up, we had to throttle things manually on the input side, to allow vacuum to complete. Otherwise throughput would eventually drop to zero.
imbradn|1 year ago
jabl|1 year ago
__natty__|1 year ago
paws|1 year ago
Thank you contributors!
monkaiju|1 year ago
unknown|1 year ago
[deleted]
miohtama|1 year ago
unknown|1 year ago
[deleted]
unknown|1 year ago
[deleted]
h1fra|1 year ago