New in PostgreSQL 10

[+] alceta|8 years ago|reply

If anyone even remotely involved with the maintenance and development of pg reads this thread - Thank you! - for all your efforts in building and improving a first class product that keeps me amazed at the strides it takes with each major. release.

[+] irrational|8 years ago|reply

Truly. We've recently moved from Oracle (after using it for 15 years) to Postgresql. It's like a breath of fresh air. The documentation for Postgres is unbelievably superior to Oracle. So far its performance is equal to or better than Oracle. We had to go through and rewrite thousands of queries, but the sql syntax of Postgres was always simpler and more logical than the equivalent in Oracle (I think Oracle has too much baggage from being around too long). All in all I'm so impressed by Postgres. I'm sure there are features in Oracle that Postgres doesn't have that keep people on Oracle, but I would imagine that the vast majority of Oracle installs could be moved to Postgres.

[+] kentt|8 years ago|reply

Seconded. Postgres is such a high quality project in so many ways. I'm rarely left wanting when using it, and very often pleasantly surprised.

[+] jackmott|8 years ago|reply

agreed. after years of having used Oracle and SQL Server at work, to find I could have a working database in about 10 seconds with

sudo apt-get install postgresql

was amazing to me!

[+] tikhon|8 years ago|reply

And they didn't even ICO

[+] jorgeleo|8 years ago|reply

Third!

[+] gshulegaard|8 years ago|reply

Just...wow.

* Native partitioning

* Parallel query

Honestly, some very welcome quality of life improvements for use cases even outside of what I would consider "Big Data".

> This means that users no longer need to create triggers for routing data; it's all handled by the system.

Trigger routing has always been a performance foot gun...to the point that it's sometimes better to handle table routing in application logic. I wonder what the performance is comparatively between this new "native" partitioning and existing methods (triggers and rules) and whether or not this makes application routing always inferior.

[+] bonesmoses|8 years ago|reply

So far as the speed is concerned, we ran some tests a while back and it came back as nearly native performance when compared to interacting with a non-partitioned table:

https://blog.2ndquadrant.com/pg-phriday-dearly-de-parted/

[+] koolba|8 years ago|reply

Parallel query was already there in 9.6 though this release improves on it further.

[+] adrianratnapala|8 years ago|reply

I worry that having things "all handled by the system" can be even more of a performance footgun. I mean that users will be able to do things that are allowed by the docs, but which result in, expensive and hard-to-see things are happening under the hood.

[+] gfodor|8 years ago|reply

I noticed hash indexes are now crash proof and replicated -- this seems to make them actually usable in production. In other words, this release effectively "adds" a new index type. That seems like a much bigger deal than is being talked about, is there any reason to believe that new databases shouldn't be using hash indexes for columns that won't be supporting range queries? (In other words, pretty much all keys.) I've seen mixed reports about hash indexes being beneficial.

[+] anarazel|8 years ago|reply

I think it's a neat feature. But note that orderedness isn't just useful for range queries, it also helps to satisfy ORDER BY and to allow for merge joins without a sort steps. There's also no yet support for index-only scans (probably never), no constraints, and no multi-column index support.

There's also still some performance kinks to work out with the current hash index performance - large when growing the index quickly or under very high concurrency.

[+] ioltas|8 years ago|reply

Equality operations are also cheaper with hash indexes than btree because less data pages need to be fetched, so this can be a gain for certain columns with a high cardinality. You should really look at if those could be used or not, you may gain in performance with a switch.

[+] developer2|8 years ago|reply

> ICU Collation Support

This tiny little mention is a massive win! We now have access to the "und-x-icu" collation. As the postgres docs[1] put it: "ICU root collation. Use this to get a reasonable language-agnostic sort order."

I've been waiting on this forever. Columns containing strings from any/all languages can now be sorted well enough to use postgres in truly multi-language applications. If your application has users writing in English, German, Russian, Japanese, et al, using the "und-x-icu" collation should make your application much more user-friendly (as opposed to using the en_US or C locale).

The ICU root collation is far from a perfect system, but it's the best option available today. I've been tied to MySQL purely for its similarly-purposed utf8mb4_unicode_ci collation. Now I can give postgres another very serious look.

[1] https://www.postgresql.org/docs/10/static/collation.html

[+] IMTDb|8 years ago|reply

PgSQL more and more looks like the "redis of databases". Whatever the problem, you can almost always find a good reason to use it :)

You get great performance for the simple "dumb" use case (relational data in DB term, key/value store in redis case), and lots of awesome additional feature on top of it for more complex situations.

I love it.

[+] qume|8 years ago|reply

Floating point timestamps - the only item which affects my work is also one of the few in this wiki page with no description. These were removed with no breaking compatibility.

For anyone else using them, I just did a bit of reading [1] and it looks like they are inferior to default 8 byte timestamps. FP timestamps have microsecond precision around the year 2000, but it drops off as it moves away from there.

I was using them to be the same as some GPS timing code, but it seems this isn't needed or desirable.

[1] https://www.postgresql.org/docs/9.0/static/datatype-datetime...

[+] chowells|8 years ago|reply

Yeah.. You should never use floating point when your use case wants uniform precision across the full range. That's the opposite of what floating point gives you.

[+] pbnjay|8 years ago|reply

> Cross-column Statistics

Holy crap I didn't even know this was feasible much less in development! Can't wait to test this out, as significant amount of my data sets have these kinds of relationships.

Parallel query will also be great for certain queries I do regularly. Been looking forward to this!

Thank you Thank you to all the devs that had a part of this!

[+] bonesmoses|8 years ago|reply

Cross-column statistics are one of my favorites. The way Postgres 10 handles it is really a _huge_ advancement that addresses a painful edge case that has been around for ages.

https://blog.2ndquadrant.com/pg-phriday-crazy-correlated-col...

[+] pgaddict|8 years ago|reply

Thanks, I'm happy it got into PG10 too. Just to make it very clear - the current implementation is fairly simple, and may not be smart enough to deal with more complicated dependencies between columns. Hopefully that will be improved in PG11.

[+] TCM|8 years ago|reply

Full text search in JSON & JSONB looks exciting. https://wiki.postgresql.org/wiki/New_in_postgres_10#Full_Tex...

[+] godzillabrennus|8 years ago|reply

As someone working with json output from the gmail api I’m curious to see how people smarter than me take advantage of this new functionality so I can adopt it as well.

[+] TCM|8 years ago|reply

SCRAM authentication looks like a nice security improvement also. No description on the site but http://paquier.xyz/postgresql-2/postgres-10-scram-authentica... gives a good overview.

[+] pjungwir|8 years ago|reply

Seems like this might make using passwords compliant with FIPS 140-2. (Not sure, so maybe someone else can share their opinion.) Previously I heard in a few places that people would use LDAP to delegate the auth to something else, e.g. here: https://news.ycombinator.com/item?id=12129906

[+] xyzzy_plugh|8 years ago|reply

It's an improvement for sure, but I am curious -- does anyone situate a Postgres instance where it is publicly accessible? Who was asking for this feature?

[+] wooptoo|8 years ago|reply

For anyone using postgres on a daily basis I would highly recommend pgcli. It has a very good auto-complete, and can even suggest JOIN statements (parts of it actually).

[+] maxpert|8 years ago|reply

Postgresql keeps getting better and better everytime! It's amazing the type of features you can pull off once you have a solid foundation :) (P.S. Have been using MySQL in past and MySQL now feels like stoneage database)

[+] dantiberian|8 years ago|reply

What is the current best option for Postgres failover? I looked at this at the start of the year, and found lots of options, but all of them seemed to have various drawbacks, and none were natively supported or built-in to Postgres.

[+] takeda|8 years ago|reply

I would recommend repmgr[1] over solutions like DRBD suggested. The simple reason is that DRBD replicates on block level and doesn't understand postgres data structures, while repmgr relies on mechanisms provided by Postgres.

So if there is a failure, you're far less likely to learn that all your data is gone, because some important block on disk was not replicated and the database is not readable. With repmgr the worst what could happen is that you might lost few latest transactions, but your data should always be consistent and in working state.

[1] http://repmgr.org/

[+] aidos|8 years ago|reply

This is a bit outside my area of expertise. Not sure about the best way to do the automated promotion process, but looking through these docs there's mention of a change to libpq that allows you to specify all the potential hosts and then have it only connect to whichever one is accepting writes (target_session_attrs).

http://paquier.xyz/postgresql-2/postgres-10-libpq-read-write...

Edit to add: I'm also interested to hear the approaches people use here. Today we've been setting up WAL-E to run our db backups so we're in that headspace. I'd actually like to decommission my main db server and provision another one to take its place. Getting the new one up and running from the wal-e archives is easy enough, the switch-over is a little more complex.

[+] rb808|8 years ago|reply

Shouldn't that be PostgreSQL X?

Seriously good work. I'm not sure why my firm still buys Oracle licenses.

[+] ianamartin|8 years ago|reply

Postgres isn't just my favorite database. Postgres is an example for all people about how a project should be run. There are lots of stakeholders who want different things. The dev team looks at what's being asked for and what is reasonable within a certain timeline. And then the team delivers features and supports them. And then on top of that there is best-in-class documentation.

Thank you to all of the team members who put your time and effort into this project. It's not only a wonderful tool that I use every day, it's also a model of how to manage a project.

[+] gigatexal|8 years ago|reply

SQLSERVER admin and dev here: these features look really compelling. Though the best part is a team of devs could run these in production and do some self support and basically pay nothing. Not so in SQLSERVER land.

[+] hardwaresofton|8 years ago|reply

Does anyone have any use cases where PostgreSQL falls down/loses to other DB systems? I know sharding/replication has long been a sticking point, but what else is there?

Why do people still choose MySQL/MariaDB/Oracle over PostgreSQL at all?

[+] no1youknowz|8 years ago|reply

Does anyone know whatever happened to the columnar storage work that 2nd quadrant posted on their blog: https://blog.2ndquadrant.com/postgresql-10-roadmap/

Is this essentially a pipe dream?

[+] petereisentraut|8 years ago|reply

Not a priority for 2Q at the moment. Still general interest in the community, but hard to predict right now.

[+] heliosAtwork|8 years ago|reply

There seems to be an in progress patch for pluggable storage:

https://www.postgresql.org/message-id/CAJrrPGdMu8TzxpRZQKK5E...

The PostgreSQL 11 roadmap also mentions "multi-model" database

https://wiki.postgresql.org/wiki/Fujitsu_roadmap#Multi-model...

[+] kbaker|8 years ago|reply

See this discussion also: https://news.ycombinator.com/item?id=15258166

[+] davedx|8 years ago|reply

How does PostgreSQL 10 compare with Cassandra for BigData requirements?

[+] manigandham|8 years ago|reply

They are completely different:

Postgres - Solid single-node relational database with SQL standards support, JSON, fulltext search, custom data types, ACID transactions, foreign data access, and more. HA/scaling through replication and some 3rd party offerings for automatic sharding, failover and multi-master if you need it. Great as the operational source-of-truth for your core data.

Cassandra - Distributed wide-column (basically advanced key/value) meant to run as a cluster of machines with native support for multiple data centers. Limited "SQL" support which is really only used for defining tables. Data access is very different (everything is an upsert or delete) with variable per-query consistency settings. Great if you need data spread globally, 100% availability, eventual consistency fast key/value, and some interesting data access patterns.

If you want high-performance Cassandra, look at ScyllaDB first. If you need something in between Cassandra and Postgres, look at CockroachDB.

[+] noncoml|8 years ago|reply

By no means an expert, but isn’t it Apples and Oranges?

Cassandra is an eventual consistent database.

Postgres is ACID.

My 2c: you don’t need Cassandra or other BigData databases. If you did, you probably wouldn’t ask the question.

[+] SEJeff|8 years ago|reply

Define "big data"? You can buy normal tier 1 server vendor hardware with 6T of memory.

If it fits into 1 machine or a few, it isn't "big data".

[+] gshulegaard|8 years ago|reply

Ultimately it depends on your use case and how much tuning/schema design/explicit management you are willing to do.

My team lead did a talk at NGINX Conf 2017 which touched on how we tuned Postgres to handle 1 Billion metrics per day for NGINX Amplify (https://www.nginx.com/products/nginx-amplify/). I would link it, but it seems it hasn't been published by our Conf team yet.

We did this on 9.4 so the changes outlined here regarding native partitioning, parallel queries, replication improvements, etc. would likely make doing a similar scaling structure easier and just straight up more performant.

I wouldn't classify 1 Billion metrics a day as "Big Data". I also would point out that our use case is unfairly biased to "recent" data rather than "all historical" data...which allows us to make some query optimizations that wouldn't be acceptable in the latter use case.

But we are using Postgres to host a constant, write-heavy workload that conventional wisdom dictates it is unsuited for. Moreover, it was/is fairly simple to implement, understand, and scale indefinitely. Personally, I think both Redshift and CitusData are various degrees of proof that scaling Postgres to handle these types of workloads is possible.

For me, the reason you want to consider using a Big Data focused tool (such as Cassandra) is at some point you end up fighting your tool to make a square peg fit in a round hole. At some point it makes more sense to adopt a specialized tool rather than work to specialize your implementation of a specific tool.

In all fairness, we are considering just that. Our road map actually has Scylla in the near future. It's not that we don't think Postgres could handle the load, but rather we think it will be easier to move specific data to Scylla and in the long run will save us resources (primarily in man hours and potentially in compute requirements as well).

Even if we move all our timeseries data to a specialized tool, we won't be getting rid of Postgres. For most/if not all our other persistence requirements Postgres is more than adequate. This is one of the reasons we chose to start with Postgres: even once we hit a scale that specialized tools made sense, Postgres would still be useful for everything else that doesn't require a specialized tool.

Moral of the story: Postgres will get you a lot farther than many give it credit for...and if you are just starting out I would highly encourage you to just pick Postgres and see how far it can take you. If you hit the limit, replace that specific part.

Edit: Also, there appears to be some movement on Columnar indexes which is worth keeping an eye on:

https://blog.2ndquadrant.com/postgresql-10-roadmap/

[+] snissn|8 years ago|reply

Does Cassandra allow aggregate functions like sum() or avg() yet?

[+] Chickenosaurus|8 years ago|reply

I am interested in the answer to this question, too.

[+] jhoechtl|8 years ago|reply

Very exciting. Still badly miss (bi-)temporal tables and native query support [1] without using extensions [2].

[1]https://docs.microsoft.com/en-us/sql/relational-databases/ta...

[2]https://pgxn.org/dist/temporal_tables/

[+] odammit|8 years ago|reply

Does anyone know if the partitioning can enforce a uniqueness constraint across child tables?

I couldn’t tell from the write up if this is one of the things addressed with Native Partitioning

[+] Tostino|8 years ago|reply

No it can't currently. There has been some discussion on how to do that for pg 11 though.

[+] tomc1985|8 years ago|reply

Seconded, this is a pretty important feature for some use cases

[+] martin_marques|8 years ago|reply

No. You need Global indexes for that.

249 comments