top | item 17963864

(no title)

b9b10eb736 | 7 years ago

From the top of my head, I can list at least four reasons to actually use anything else:

- Not distributable

- Poor resiliency

- Hard to upgrade

- Schemas remains at postgres core

Every solution to address any of these points in PG are hacks (bucardo, londist, slony, pgbouncer, things relying on triggers or proxies). Quite honestly, PG contributors are doing an impressive job. PG is arguably the best relational database in the market as of today and still improving very fast. However it's also one of the most (if not the most) bloated. And despite all these features nobody needs and after all these years, it's still lacking some very basic yet essential items anyone would ask for in any modern application. It feels like things haven't changed that much since the 90's. DBAs have to hack around things and root issues never gets addressed. Implementing JIT optimizations looks like fun. It will undoubtedly improve some queries by 2x, 4x or even 10x, but it won't be a game changer anyways. It'll just increase the postgres bloat and the overall complexity of the system. Meanwhile, we'll still be lacking essential things that would make PG suitable for pretty much any use-case.

discuss

order

anarazel|7 years ago

People work on the features that they either are paid to work on, or are otherwise interested in. While far from a perfect selection, there's some correlation between what companies are willing to pay people to work on in PG and what their users need.

It may turn out that your essential, and a lot of other users essential aren't exactly the same.

I worked on the JIT stuff because there was quite some concern about query speed (yes, from actual users). And after some other micro-optimizations that was the right point to attack the performance.

If you have strong feelings what you want to be worked on, you'd be more than welcome to help. Or alternatively some of the dev companies around PG certainly would appreciate business.

b9b10eb736|7 years ago

I totally upvote and agree with your comment. Thanks a lot for working on JIT, that's invaluable. We (as a community) will never thank the PG contributors enough for their work. I'd love to contribute to the postgres project, though I couldn't make it at work at the moment unfortunately. And the cost of entry is quite expensive for what I can give in my free time. I'm very sorry about this. I know well that it's always easier to complain about things rather than fixing them. Doesn't mean there's absolutely no content in it. I hope my comment did not hurt you, it was not my intention.

grzm|7 years ago

> "- Not distributable" This is something that's actively being worked on. It's also useful to look at the systems that are more easily distributable, and compare other features they posses with Postgres. Aphyr has provided a wonderful service with Jepsen showing just how difficult correctness in general (with and without the added complexity of distributability) actually is.

- "Poor resiliency" Do you mean something by this beyond as an aspect of distributability?

- "Hard to upgrade" The story here has been much improved, both with pg_upgrade and though features such as logical replication.

- "Schemas remains at postgres core" Would you elaborate what you mean by this?

I would also ask you to elaborate what you mean by "However it's also one of the most (if not the most) bloated." In terms of binary size? Feature set? Actual on-disk data representation? If so, what are you comparing it to?

Similarly, "It feels like things haven't changed that much since the 90's.": Having used Postgres over the years, I can certainly say that things have become much easier. What in particular are you thinking of?

I also struggle to square what appears to be complaints of capability while also expressing frustration with "postgres bloat and overall complexity of the system." As with all things, there are tradeoffs and resource limitations. What would you in particular are you looking for? And given the nature of Postgres as an open source project, what are you doing to ensure that those features you desire are being worked on? There are plenty of developers and organizations that are willing to work on bespoke features given the appropriate support.

Jepsen: https://jepsen.io

b9b10eb736|7 years ago

Thanks for replying. Honestly, I expected a more closed and rude answer because I know my opinion on this is quite unpopular. The context of my answer: I love PostgreSQL, I've been using it in production, day-to-day, at work and for personal projects. Still it happens that I have also used more modern databases and they fixed some flaws postgres has (while having alot of other flaws that's not the point). I was answering to the question "what would make anybody choose another database given that Postgres can do pretty much anything?".

> This is something that's actively being worked on. It's also useful to look at the systems that are more easily distributable, and compare other features they posses with Postgres. Aphyr has provided a wonderful service with Jepsen showing just how difficult correctness in general (with and without the added complexity of distributability) actually is.

Thanks. I'll definitely have a look into this. I'm definitely not saying that everything would be easy and ideal to implement in such a robust and mature software (I am even sure it would be a nightmare for the developers to develop and stabilize and that for this reason it's unlikely to ever happen). Other databases provides this out of the box, postgres just don't, and that's a good reason "why would anyone use anything else". Still, not a requirement for everybody, but more frequent than JIT IMO.

> Do you mean something by this beyond as an aspect of distributability?

I mean that physical replication is great, but you always get into troubles when it comes to failover. Pretty much any modern DB system can be put in front of a basic HTTP LB, and replicate data on n nodes. It's indeed tightly coupled with distribution, but I meant distribution in term of performance and sharding (horizontal scalability) and resiliency in term of reliability. Relying on DNS for failover induces a downtime, relying on a proxy induces a hack. I'm wondering why it would be so hard to fix this in postgres though.

> The story here has been much improved, both with pg_upgrade and though features such as logical replication.

This is definitely true. While I think logical replication could/should go even further to really fix the upgrade issue, PG 10 is a huge move in the right direction. Thanks for pointing that out.

> "Schemas remains at postgres core" Would you elaborate what you mean by this?

Maybe it's more a general feeling than something really concrete, but for instance, would you create a table with only one JSONB column. It would feel weird, like abusing/hacking around the schema. From the user perspective, it's impractical, not natural, and it doesn't look like a document store at all (as we see in nosql DBs). In this respect, I don't think I will ever use postgres as a document-only database because I feel like it's just not how postgres is built (unless I'm missing some great feature?). That's a subject were document-only database may have the point.

> In terms of binary size? Feature set? Actual on-disk data representation? If so, what are you comparing it to?

I'm only speaking in term of feature set. A lot of documentation comes with them. A lot of side-effects. A lot of wrong expectations from the user. A lot of complexity. A lot of maintenance burden which makes postgres more complex to evolve on huge topics.

Also, this is a common problem across the most used relational DBMS (MySQL, Oracle, SQL Server and Postgres). To give another perspective, ElasticSearch dropped a massive amount of features in version 6 just so that they can focus on the core things. I personally think it was a good move.

> Having used Postgres over the years, I can certainly say that things have become much easier. What in particular are you thinking of?

I have also used PG for years and while I reckon you are completely right (a lot of improvements have been made in a tremendous amount of places), I was pointing out that the core topic I listed were still such a pain to handle, while so important since day 1.

> I also struggle to square what appears to be complaints of capability while also expressing frustration with "postgres bloat and overall complexity of the system." As with all things, there are tradeoffs and resource limitations. What would you in particular are you looking for?

Ok, let me fix this: I'm not complaining at all, really really not. PG guys are doing a fantastic job, PG is a great software, no doubt about it. I would even go further: in my opinion postgres is the currently the best mature RDBMS in the place. I am just saying that postgres is not and probably never will be the answer to all the DB use-cases in the world. More than that: postgres is even failing on very basic stuff needed in almost every modern app (mostly because it was started at a time were HA was not really a concern). Put another way: some modern DBs are solving some modern problems better than postgres and might disqualify postgres for some of these use-cases. I did not meant anything else, believe me, and I apologize sincerely to all the developers that have worked on the project if my comment was taken like that.

minxomat|7 years ago

> And despite all these features nobody needs

Please do speak for yourself.

b9b10eb736|7 years ago

"nobody needs" is to take in the unix philosophy sense here. If you take it first degree, you're part of the reason why PG has been missing these essential points for so long (IMHO). In comparison to the items I've listed, those features are nice to have but definitely not used by the majority of users. And I do use these advanced features (nobody needs) in PostgreSQL. I use them extensively and really love them. It's just that I could workaround them easily and properly. That's just not the case for the things I've listed.

threeseed|7 years ago

Shame comments like this get downvoted. Because it's 100% true.

PostgreSQL still is lacking with its horizontal scalability story and there is no reason to choose it over something like Solr/ElasticSearch right now largely because of this.

digitalzombie|7 years ago

You're the reason why I inherit project that uses only ES and then end up doing relational schema in ES because they didn't use relational database in general.

If you had to choose between PostgreSQL or ElasticSearch then you have no idea what you're doing.

jeltz|7 years ago

If it is 100% true then can you explain what he is referring to with "Poor resiliency" and "Schemas remains at postgres core"?

Also PostgreSQL is not hard to update, the only issue with PostgreSQL and upgrades is that it is hard to upgrade with zero downtime (pg_upgrade is very fast but still requires a small bit of downtime).

I can agree with the complaint about distribution, but that is a common issue for most databases not built from the start around distribution and nothing I would call "fundamentals".

mmt|7 years ago

Sometimes the issue with a comment isn't about its truth or lack therefore, but with its tone or potential for inciting flamewar (a.k.a. trollishness). With a brand new account, the intent is even harder to discern.

> PostgreSQL still is lacking with its horizontal scalability story

To the extent this is true, it may still be irrelevant for many environments. The ability to increase performance by just adding more servers (what I believe is usually meant by "horizonatal scalability") is not, despite its popularity, a desirable end in and of itself.

It's only desirable as a means of scalability, if scaling a "single" server (a.k.a. vertical) would reach the limit of available hardware, or if even approaching that limit becomes prohibitively expensive.

My impression from talking to managers (including startup founders) is that they often erroneously believe they fall into the latter category. Specifically, they believe that the overhead inherent in distributed systems is, essentially, negligible, so that they're avoiding the 2-4x price premium of a high-end single system. The reality is not so negligible, especially considering inefficiencies unique to a particular distributed database, regardless of if it's inescapable due to its nature.

The former can, of course, be a real concern, especially for VC-funded startups, for whom a lack of fast enough growth is its own failure. However, even in that situation, there's reason for skepticism, since most of the original "web scale" horror stories tended to involve MySQL (which had, and maybe still has, reduced capabilities for performance improvements through hardware) and database servers that hadn't even maxed out their I/O capabilities (not trivial to do naively pre-SSD).

> there is no reason to choose it over something like Solr/ElasticSearch right now largely because of this.

Even the OC points recognizes [1] that these distributed systems aren't without their drawbacks. Those drawbacks are additional reasons to choose Postgres over them.

[1] https://news.ycombinator.com/item?id=17964732

brightball|7 years ago

Distribution is also very application specific, in terms of how you distribute. If you’re going multi-tenant Citus (article source) works beautifully.