If anyone even remotely involved with the maintenance and development of pg reads this thread - Thank you! - for all your efforts in building and improving a first class product that keeps me amazed at the strides it takes with each major. release.
Truly. We've recently moved from Oracle (after using it for 15 years) to Postgresql. It's like a breath of fresh air. The documentation for Postgres is unbelievably superior to Oracle. So far its performance is equal to or better than Oracle. We had to go through and rewrite thousands of queries, but the sql syntax of Postgres was always simpler and more logical than the equivalent in Oracle (I think Oracle has too much baggage from being around too long). All in all I'm so impressed by Postgres. I'm sure there are features in Oracle that Postgres doesn't have that keep people on Oracle, but I would imagine that the vast majority of Oracle installs could be moved to Postgres.
Honestly, some very welcome quality of life improvements for use cases even outside of what I would consider "Big Data".
> This means that users no longer need to create triggers for routing data; it's all handled by the system.
Trigger routing has always been a performance foot gun...to the point that it's sometimes better to handle table routing in application logic. I wonder what the performance is comparatively between this new "native" partitioning and existing methods (triggers and rules) and whether or not this makes application routing always inferior.
So far as the speed is concerned, we ran some tests a while back and it came back as nearly native performance when compared to interacting with a non-partitioned table:
I worry that having things "all handled by the system" can be even more of a performance footgun. I mean that users will be able to do things that are allowed by the docs, but which result in, expensive and hard-to-see things are happening under the hood.
I noticed hash indexes are now crash proof and replicated -- this seems to make them actually usable in production. In other words, this release effectively "adds" a new index type. That seems like a much bigger deal than is being talked about, is there any reason to believe that new databases shouldn't be using hash indexes for columns that won't be supporting range queries? (In other words, pretty much all keys.) I've seen mixed reports about hash indexes being beneficial.
I think it's a neat feature. But note that orderedness isn't just useful for range queries, it also helps to satisfy ORDER BY and to allow for merge joins without a sort steps. There's also no yet support for index-only scans (probably never), no constraints, and no multi-column index support.
There's also still some performance kinks to work out with the current hash index performance - large when growing the index quickly or under very high concurrency.
Equality operations are also cheaper with hash indexes than btree because less data pages need to be fetched, so this can be a gain for certain columns with a high cardinality. You should really look at if those could be used or not, you may gain in performance with a switch.
This tiny little mention is a massive win! We now have access to the "und-x-icu" collation. As the postgres docs[1] put it: "ICU root collation. Use this to get a reasonable language-agnostic sort order."
I've been waiting on this forever. Columns containing strings from any/all languages can now be sorted well enough to use postgres in truly multi-language applications. If your application has users writing in English, German, Russian, Japanese, et al, using the "und-x-icu" collation should make your application much more user-friendly (as opposed to using the en_US or C locale).
The ICU root collation is far from a perfect system, but it's the best option available today. I've been tied to MySQL purely for its similarly-purposed utf8mb4_unicode_ci collation. Now I can give postgres another very serious look.
PgSQL more and more looks like the "redis of databases".
Whatever the problem, you can almost always find a good reason to use it :)
You get great performance for the simple "dumb" use case (relational data in DB term, key/value store in redis case), and lots of awesome additional feature on top of it for more complex situations.
Floating point timestamps - the only item which affects my work is also one of the few in this wiki page with no description. These were removed with no breaking compatibility.
For anyone else using them, I just did a bit of reading [1] and it looks like they are inferior to default 8 byte timestamps. FP timestamps have microsecond precision around the year 2000, but it drops off as it moves away from there.
I was using them to be the same as some GPS timing code, but it seems this isn't needed or desirable.
Yeah.. You should never use floating point when your use case wants uniform precision across the full range. That's the opposite of what floating point gives you.
Holy crap I didn't even know this was feasible much less in development! Can't wait to test this out, as significant amount of my data sets have these kinds of relationships.
Parallel query will also be great for certain queries I do regularly. Been looking forward to this!
Thank you Thank you to all the devs that had a part of this!
Cross-column statistics are one of my favorites. The way Postgres 10 handles it is really a _huge_ advancement that addresses a painful edge case that has been around for ages.
Thanks, I'm happy it got into PG10 too. Just to make it very clear - the current implementation is fairly simple, and may not be smart enough to deal with more complicated dependencies between columns. Hopefully that will be improved in PG11.
As someone working with json output from the gmail api I’m curious to see how people smarter than me take advantage of this new functionality so I can adopt it as well.
Seems like this might make using passwords compliant with FIPS 140-2. (Not sure, so maybe someone else can share their opinion.) Previously I heard in a few places that people would use LDAP to delegate the auth to something else, e.g. here: https://news.ycombinator.com/item?id=12129906
It's an improvement for sure, but I am curious -- does anyone situate a Postgres instance where it is publicly accessible? Who was asking for this feature?
For anyone using postgres on a daily basis I would highly recommend pgcli. It has a very good auto-complete, and can even suggest JOIN statements (parts of it actually).
Postgresql keeps getting better and better everytime! It's amazing the type of features you can pull off once you have a solid foundation :) (P.S. Have been using MySQL in past and MySQL now feels like stoneage database)
What is the current best option for Postgres failover? I looked at this at the start of the year, and found lots of options, but all of them seemed to have various drawbacks, and none were natively supported or built-in to Postgres.
I would recommend repmgr[1] over solutions like DRBD suggested. The simple reason is that DRBD replicates on block level and doesn't understand postgres data structures, while repmgr relies on mechanisms provided by Postgres.
So if there is a failure, you're far less likely to learn that all your data is gone, because some important block on disk was not replicated and the database is not readable. With repmgr the worst what could happen is that you might lost few latest transactions, but your data should always be consistent and in working state.
This is a bit outside my area of expertise. Not sure about the best way to do the automated promotion process, but looking through these docs there's mention of a change to libpq that allows you to specify all the potential hosts and then have it only connect to whichever one is accepting writes (target_session_attrs).
Edit to add: I'm also interested to hear the approaches people use here. Today we've been setting up WAL-E to run our db backups so we're in that headspace. I'd actually like to decommission my main db server and provision another one to take its place. Getting the new one up and running from the wal-e archives is easy enough, the switch-over is a little more complex.
Postgres isn't just my favorite database. Postgres is an example for all people about how a project should be run. There are lots of stakeholders who want different things. The dev team looks at what's being asked for and what is reasonable within a certain timeline. And then the team delivers features and supports them. And then on top of that there is best-in-class documentation.
Thank you to all of the team members who put your time and effort into this project. It's not only a wonderful tool that I use every day, it's also a model of how to manage a project.
SQLSERVER admin and dev here: these features look really compelling. Though the best part is a team of devs could run these in production and do some self support and basically pay nothing. Not so in SQLSERVER land.
Does anyone have any use cases where PostgreSQL falls down/loses to other DB systems? I know sharding/replication has long been a sticking point, but what else is there?
Why do people still choose MySQL/MariaDB/Oracle over PostgreSQL at all?
Postgres - Solid single-node relational database with SQL standards support, JSON, fulltext search, custom data types, ACID transactions, foreign data access, and more. HA/scaling through replication and some 3rd party offerings for automatic sharding, failover and multi-master if you need it. Great as the operational source-of-truth for your core data.
Cassandra - Distributed wide-column (basically advanced key/value) meant to run as a cluster of machines with native support for multiple data centers. Limited "SQL" support which is really only used for defining tables. Data access is very different (everything is an upsert or delete) with variable per-query consistency settings. Great if you need data spread globally, 100% availability, eventual consistency fast key/value, and some interesting data access patterns.
If you want high-performance Cassandra, look at ScyllaDB first. If you need something in between Cassandra and Postgres, look at CockroachDB.
Ultimately it depends on your use case and how much tuning/schema design/explicit management you are willing to do.
My team lead did a talk at NGINX Conf 2017 which touched on how we tuned Postgres to handle 1 Billion metrics per day for NGINX Amplify (https://www.nginx.com/products/nginx-amplify/). I would link it, but it seems it hasn't been published by our Conf team yet.
We did this on 9.4 so the changes outlined here regarding native partitioning, parallel queries, replication improvements, etc. would likely make doing a similar scaling structure easier and just straight up more performant.
I wouldn't classify 1 Billion metrics a day as "Big Data". I also would point out that our use case is unfairly biased to "recent" data rather than "all historical" data...which allows us to make some query optimizations that wouldn't be acceptable in the latter use case.
But we are using Postgres to host a constant, write-heavy workload that conventional wisdom dictates it is unsuited for. Moreover, it was/is fairly simple to implement, understand, and scale indefinitely. Personally, I think both Redshift and CitusData are various degrees of proof that scaling Postgres to handle these types of workloads is possible.
For me, the reason you want to consider using a Big Data focused tool (such as Cassandra) is at some point you end up fighting your tool to make a square peg fit in a round hole. At some point it makes more sense to adopt a specialized tool rather than work to specialize your implementation of a specific tool.
In all fairness, we are considering just that. Our road map actually has Scylla in the near future. It's not that we don't think Postgres could handle the load, but rather we think it will be easier to move specific data to Scylla and in the long run will save us resources (primarily in man hours and potentially in compute requirements as well).
Even if we move all our timeseries data to a specialized tool, we won't be getting rid of Postgres. For most/if not all our other persistence requirements Postgres is more than adequate. This is one of the reasons we chose to start with Postgres: even once we hit a scale that specialized tools made sense, Postgres would still be useful for everything else that doesn't require a specialized tool.
Moral of the story: Postgres will get you a lot farther than many give it credit for...and if you are just starting out I would highly encourage you to just pick Postgres and see how far it can take you. If you hit the limit, replace that specific part.
Edit: Also, there appears to be some movement on Columnar indexes which is worth keeping an eye on:
[+] [-] alceta|8 years ago|reply
[+] [-] irrational|8 years ago|reply
[+] [-] kentt|8 years ago|reply
[+] [-] jackmott|8 years ago|reply
sudo apt-get install postgresql
was amazing to me!
[+] [-] tikhon|8 years ago|reply
[+] [-] jorgeleo|8 years ago|reply
[+] [-] gshulegaard|8 years ago|reply
* Native partitioning
* Parallel query
Honestly, some very welcome quality of life improvements for use cases even outside of what I would consider "Big Data".
> This means that users no longer need to create triggers for routing data; it's all handled by the system.
Trigger routing has always been a performance foot gun...to the point that it's sometimes better to handle table routing in application logic. I wonder what the performance is comparatively between this new "native" partitioning and existing methods (triggers and rules) and whether or not this makes application routing always inferior.
[+] [-] bonesmoses|8 years ago|reply
https://blog.2ndquadrant.com/pg-phriday-dearly-de-parted/
[+] [-] koolba|8 years ago|reply
[+] [-] adrianratnapala|8 years ago|reply
[+] [-] gfodor|8 years ago|reply
[+] [-] anarazel|8 years ago|reply
There's also still some performance kinks to work out with the current hash index performance - large when growing the index quickly or under very high concurrency.
[+] [-] ioltas|8 years ago|reply
[+] [-] developer2|8 years ago|reply
This tiny little mention is a massive win! We now have access to the "und-x-icu" collation. As the postgres docs[1] put it: "ICU root collation. Use this to get a reasonable language-agnostic sort order."
I've been waiting on this forever. Columns containing strings from any/all languages can now be sorted well enough to use postgres in truly multi-language applications. If your application has users writing in English, German, Russian, Japanese, et al, using the "und-x-icu" collation should make your application much more user-friendly (as opposed to using the en_US or C locale).
The ICU root collation is far from a perfect system, but it's the best option available today. I've been tied to MySQL purely for its similarly-purposed utf8mb4_unicode_ci collation. Now I can give postgres another very serious look.
[1] https://www.postgresql.org/docs/10/static/collation.html
[+] [-] IMTDb|8 years ago|reply
You get great performance for the simple "dumb" use case (relational data in DB term, key/value store in redis case), and lots of awesome additional feature on top of it for more complex situations.
I love it.
[+] [-] qume|8 years ago|reply
For anyone else using them, I just did a bit of reading [1] and it looks like they are inferior to default 8 byte timestamps. FP timestamps have microsecond precision around the year 2000, but it drops off as it moves away from there.
I was using them to be the same as some GPS timing code, but it seems this isn't needed or desirable.
[1] https://www.postgresql.org/docs/9.0/static/datatype-datetime...
[+] [-] chowells|8 years ago|reply
[+] [-] pbnjay|8 years ago|reply
Holy crap I didn't even know this was feasible much less in development! Can't wait to test this out, as significant amount of my data sets have these kinds of relationships.
Parallel query will also be great for certain queries I do regularly. Been looking forward to this!
Thank you Thank you to all the devs that had a part of this!
[+] [-] bonesmoses|8 years ago|reply
https://blog.2ndquadrant.com/pg-phriday-crazy-correlated-col...
[+] [-] pgaddict|8 years ago|reply
[+] [-] TCM|8 years ago|reply
[+] [-] godzillabrennus|8 years ago|reply
[+] [-] TCM|8 years ago|reply
[+] [-] pjungwir|8 years ago|reply
[+] [-] xyzzy_plugh|8 years ago|reply
[+] [-] wooptoo|8 years ago|reply
[+] [-] maxpert|8 years ago|reply
[+] [-] dantiberian|8 years ago|reply
[+] [-] takeda|8 years ago|reply
So if there is a failure, you're far less likely to learn that all your data is gone, because some important block on disk was not replicated and the database is not readable. With repmgr the worst what could happen is that you might lost few latest transactions, but your data should always be consistent and in working state.
[1] http://repmgr.org/
[+] [-] aidos|8 years ago|reply
http://paquier.xyz/postgresql-2/postgres-10-libpq-read-write...
Edit to add: I'm also interested to hear the approaches people use here. Today we've been setting up WAL-E to run our db backups so we're in that headspace. I'd actually like to decommission my main db server and provision another one to take its place. Getting the new one up and running from the wal-e archives is easy enough, the switch-over is a little more complex.
[+] [-] rb808|8 years ago|reply
Seriously good work. I'm not sure why my firm still buys Oracle licenses.
[+] [-] ianamartin|8 years ago|reply
Thank you to all of the team members who put your time and effort into this project. It's not only a wonderful tool that I use every day, it's also a model of how to manage a project.
[+] [-] gigatexal|8 years ago|reply
[+] [-] hardwaresofton|8 years ago|reply
Why do people still choose MySQL/MariaDB/Oracle over PostgreSQL at all?
[+] [-] no1youknowz|8 years ago|reply
Is this essentially a pipe dream?
[+] [-] petereisentraut|8 years ago|reply
[+] [-] heliosAtwork|8 years ago|reply
https://www.postgresql.org/message-id/CAJrrPGdMu8TzxpRZQKK5E...
The PostgreSQL 11 roadmap also mentions "multi-model" database
https://wiki.postgresql.org/wiki/Fujitsu_roadmap#Multi-model...
[+] [-] kbaker|8 years ago|reply
[+] [-] davedx|8 years ago|reply
[+] [-] manigandham|8 years ago|reply
Postgres - Solid single-node relational database with SQL standards support, JSON, fulltext search, custom data types, ACID transactions, foreign data access, and more. HA/scaling through replication and some 3rd party offerings for automatic sharding, failover and multi-master if you need it. Great as the operational source-of-truth for your core data.
Cassandra - Distributed wide-column (basically advanced key/value) meant to run as a cluster of machines with native support for multiple data centers. Limited "SQL" support which is really only used for defining tables. Data access is very different (everything is an upsert or delete) with variable per-query consistency settings. Great if you need data spread globally, 100% availability, eventual consistency fast key/value, and some interesting data access patterns.
If you want high-performance Cassandra, look at ScyllaDB first. If you need something in between Cassandra and Postgres, look at CockroachDB.
[+] [-] noncoml|8 years ago|reply
Cassandra is an eventual consistent database.
Postgres is ACID.
My 2c: you don’t need Cassandra or other BigData databases. If you did, you probably wouldn’t ask the question.
[+] [-] SEJeff|8 years ago|reply
If it fits into 1 machine or a few, it isn't "big data".
[+] [-] gshulegaard|8 years ago|reply
My team lead did a talk at NGINX Conf 2017 which touched on how we tuned Postgres to handle 1 Billion metrics per day for NGINX Amplify (https://www.nginx.com/products/nginx-amplify/). I would link it, but it seems it hasn't been published by our Conf team yet.
We did this on 9.4 so the changes outlined here regarding native partitioning, parallel queries, replication improvements, etc. would likely make doing a similar scaling structure easier and just straight up more performant.
I wouldn't classify 1 Billion metrics a day as "Big Data". I also would point out that our use case is unfairly biased to "recent" data rather than "all historical" data...which allows us to make some query optimizations that wouldn't be acceptable in the latter use case.
But we are using Postgres to host a constant, write-heavy workload that conventional wisdom dictates it is unsuited for. Moreover, it was/is fairly simple to implement, understand, and scale indefinitely. Personally, I think both Redshift and CitusData are various degrees of proof that scaling Postgres to handle these types of workloads is possible.
For me, the reason you want to consider using a Big Data focused tool (such as Cassandra) is at some point you end up fighting your tool to make a square peg fit in a round hole. At some point it makes more sense to adopt a specialized tool rather than work to specialize your implementation of a specific tool.
In all fairness, we are considering just that. Our road map actually has Scylla in the near future. It's not that we don't think Postgres could handle the load, but rather we think it will be easier to move specific data to Scylla and in the long run will save us resources (primarily in man hours and potentially in compute requirements as well).
Even if we move all our timeseries data to a specialized tool, we won't be getting rid of Postgres. For most/if not all our other persistence requirements Postgres is more than adequate. This is one of the reasons we chose to start with Postgres: even once we hit a scale that specialized tools made sense, Postgres would still be useful for everything else that doesn't require a specialized tool.
Moral of the story: Postgres will get you a lot farther than many give it credit for...and if you are just starting out I would highly encourage you to just pick Postgres and see how far it can take you. If you hit the limit, replace that specific part.
Edit: Also, there appears to be some movement on Columnar indexes which is worth keeping an eye on:
https://blog.2ndquadrant.com/postgresql-10-roadmap/
[+] [-] snissn|8 years ago|reply
[+] [-] Chickenosaurus|8 years ago|reply
[+] [-] jhoechtl|8 years ago|reply
[1]https://docs.microsoft.com/en-us/sql/relational-databases/ta...
[2]https://pgxn.org/dist/temporal_tables/
[+] [-] odammit|8 years ago|reply
I couldn’t tell from the write up if this is one of the things addressed with Native Partitioning
[+] [-] Tostino|8 years ago|reply
[+] [-] tomc1985|8 years ago|reply
[+] [-] martin_marques|8 years ago|reply