- we now have closed source Amazon Aurora infrastructure that boasts performance gains that might never see it back upstream (who knows if it's just hardware or software or what behind the scenes here)
- we now have Amazon DocumentDB that is a closed source MongoDB-like scripting interface with Postgres under the hood
- lastly, with this news, looks like Microsoft is now doubling down on the same strategy to build out infrastructure and _possibly_ closed source "forked" wins on top of the beautiful open source world that is Postgres
Please, please, please let's be sure to upstream! I love the cloud but when I go to "snapshot" and "restore" my PG DB I want a little transparency how y'all are doing this. Same with DocumentDB; I'd love an article of how they are using JSONB indices at this supposed scale! Not trying to throw shade; just raising my eyebrows a little.
Craig here from Citus. We're actually a bit different than past forks. Many years ago Citus itself was a fork, but about 3 years ago we became a pure extension[1]. This means we hook into lower level extension APIs[2] that exist within Postgres and are able to stay current with the latest Postgres versions.
If the creators of Postgres wanted all improvements to be upstreamed, they wouldn’t have released under a permissive license. The ability to use Postgres commercially without exposing your entire codebase to copyleft risk is one of the reasons it’s used commercially in the first place.
Amazon Aurora doesn't have much to do with Postgres and is a custom storage subsystem used by many different database engines. Aurora Postgres is actually using Postgres code on top to handle queries, and eventually PG itself will get pluggable storage engines.
It's similar with Redshift although it's a much older codebase from the v8 branch with more customizations. The changes are very specific to their infrastructure and wouldn't help anyone else since it's not designed as an on-prem deployable product.
There's also no confirmation that DocumentDB runs on Postgres and its most likely a custom interface layer they wrote themselves. If you just want MongoDB on postgres then there are already open source projects that do it.
Kudos to azure for opening so much of what they do. Lots of kubernetes work, including AKS-engine which runs their k8s implementation. Machine learning toolkit. Media services (faceid etc) as a container. The whole azure shabang runs on service fabric, which they've also open sourced.
It's a differentiator for some of their workloads: you don't have to hand your business over to a black box.
Aurora databases and DocumentDB share the same underlying reliable single-writer, many-reader block device for storage. That is all the magic. Not sure where you got the idea that DocumentDB has Postgres underneath it.
I get what you're saying, but BSD-licenses are specifically designed to facilitate things not being sent upstream. I don't understand why people moan about companies complying with the license agreement.
Amazon has explained in their reinvent videos that Aurora is the storage layer of Postgres rewritten to be tightly coupled to their AWS infrastructure. So it is just regular Postgres (they upgrade to latest on a slightly slower cadence). And there’s no benefit to getting the Aurora layer upstream, no one else could use it anyway.
Citus is an extension, not a fork.
So neither of these projects are doing Postgres a dis-service. Both are actually pretty heavily aligned with the continued success and maintenance of mainline open source Postgres.
This is the future and it's not just big companies doing it.
Virtually all of the companies that were built on open source products in the past few years stopped centering their focus as being the best place to run said open source program, but instead holding back performance and feature improvement as proprietary instead of pushing back upstream.
we now have closed source Amazon Aurora infrastructure that boasts performance gains that might never see it back upstream (who knows if it's just hardware or software or what behind the scenes here)
The performance benefits of Aurora over Postgres are mostly because Amazon rewrote the storage engine to run on top of their infrastructure.
> - we now have Amazon DocumentDB that is a closed source MongoDB-like scripting interface with Postgres under the hood
To clarify, Amazon DocumentDB uses the Aurora storage engine, which is the same proprietary storage engine that is used by Aurora MySQL and Aurora PostgreSQL, and gives you multi-facility durability by writing 6 copies of your data across 3 facilities, with a 4 of 6 quorum before writes are acknowledged back to the client.
So, it's a bit inaccurate to say that DocumentDB has anything to do with Postgres.
I would argue Microsoft's strategy actually makes them more wedded and committed to ensuring the vitality of open source PostgreSQL than anything AWS is doing.
The big news here: Citus Data donated 1% of their equity to non-profit PostgreSQL organizations[1] so this acquisition is a win for the community even in the darkest scenario of Citus Data disappearing into a canyon on the Microsoft campus.
Given Microsoft's change in operation over recent years there's also hope that they can continue their contributions into the future.
It's fascinating to see Microsoft leave behind the "embrace, extend, extinguish" narrative only to have Amazon adopt it, causing massive rifts and action within the database community[2][3]. I am genuinely concerned about the future of open source software in this continued scenario.
An article with what I considered an outrageous headline ("Is Amazon 'strip mining' open source?"[4]) has only rung more true over time. Amazon is one of the largest companies on earth, selling products that they receive for free but never improve[5], attacking the primary open source provider, and then shift toward their comparable proprietary closed offerings.
Hopefully new ways to "give back", such as equity contribution, can be one of the many paths forward needed to keep open source software healthy. Given how much innovation is unlocked by this, it'd be a crime to go back to the past era.
[5]: From [2], "Jay Kreps, a creator of Kafka and co-founder and CEO of Confluent ... said Amazon has not contributed a single line of code to the Apache Kafka open-source software and is not reselling Confluent’s cloud tool."
I still can't get over the fact that Microsoft is using Postgres internally, if you had told me that 5 years ago I wouldn't have believed it. Did they go into why over MSSQL?
The main question is: Did MS want an expert PgSQL team to work on Azure PostgreSQL (and may to create a proprietary competitor to Aurora)? Or Did they acquire Citus for its product, to improve and market it further?
It feels like it was the first. If so, it means bad news for Citus product as it will most likely be ignored for a while. That will be really sad, as I don't know any actively supported automated sharding solution for PgSQL other than Citus. There is PostgresXL[1], but there isn't much focus to make it community friendly.
I don't think anyone should expect acquihiring an expert Postgres team to work on a proprietary product to work well, because the programmers' skills are eminently transferrable.
Half the team would probably wander off to work for one of the other postgres-centered companies (and quite possibly continue to work on the open source Citus code).
Great news for Citus, Microsoft, Postgres and for people using open source relational databases. This makes so much sense. (I know this comment might read naive to some but I’m genuinely excited right now)
I'm pretty excited as well... Especially if this means improvements to Azure's PostgreSQL options. DBaaS is one of the areas where cloud providers give a LOT of value, more so as long as the interfaces you use can be used internally/locally for development.
Similarly, I really appreciate MS-SQL for Linux on Docker as it is a lot easier to setup for CI/CD and local for dev and testing and is nearly transparent going to Azure SQL or MS SQL Enterprise for hosted deployments. I'd much rather use PostgreSQL with PLv8 than MS-SQL though.
I wonder how long it will be before they shutdown their own Citus Cloud hosted offering, which is hosted on AWS. Seems obvious that will become part of Azure soon.
It's going to be really funny if Microsoft ends up using Open Source software to compete against its proprietary service-based competitors. Sort of like how GCP runs k8s... you can use the free tool, or you can use the managed service, and the community helps build the thing. In theory, you retain competitive advantage because you have the most expertise in the product.
The Googles of the world lose out on professional services, but Microsoft could still make a bundle of money by just consulting on the tools without even managing them. You might even make higher margins by not managing the service.
Because MS is more and more in the business of selling the operation of software as a service, instead of selling licenses for their customers to operate themselves.
Think of them like a wedding event rental company, they are more than happy to rent you their own brand of tables, flatware, and silverware, but if you want another brand that’s fine too as long is you buy from them.
Microsoft is hedging their bets. PostgreSQL has the potential to disrupt the traditional relational database market, so if they're going to be disrupted then better to do it themselves.
I expect they'll also try to port Citus Data functionality to the SQL Server platform.
A little off topic, but I wonder how long it will be before MS acquires Docker Inc. Seems like an even better fit for them now that they own GitHub. GitHub + Docker Hub on the developer engagement side and Docker Enterprise on the traditional enterprise side.
I'm wondering how much the OCI and CRI-O has impacted Docker's value proposition. Docker Hub seems more and more like the real product, though I guess you could argue that the container runtime was never really a product in the first place.
I haven't used Citus but once thought about Cstore_fdw. How much of this is about Cstore_fdw? I am curious because in data warehousing space my experience has been column store databases totally rule when it comes to speed on analytics. I know SQL Server has column store indexes but that requires you to create them whereas with genuine column store you get the performance boost by virtue of how data is stored.
SQL Server indexes can be either clustered or non-clustered, which determines whether table data is stored by index order. If you have a clustered columnstore index then the table is actually physically stored in a column-oriented format. Combined with vectorized processing, an impressive query optimizer, and in-memory tables, MSSQL is one of the fastest OLAP systems available.
Also Cstore_fdw is rather obsolete and more of an experiment. It's a rough wrapper around ORC files and is missing many features, advancements and an execution engine to match the performance and usability of a real OLAP database.
For data analytics I use ClickHouse instead of PostgreSQL. There is a PostgreSQL Foreign Data Wrapper (FDW) for the ClickHouse database, but I have never used it.
Maybe now they will actually add a free tier so people can sign up for this, develop their product using a free tier, and upgrade when they launch, as is the natural progression with most other cloud products. I think before there were some complexity and/or financial issues preventing this but with Microsoft's wallet it shouldn't be an issue.
[+] [-] cdbattags|7 years ago|reply
- we now have closed source Amazon Aurora infrastructure that boasts performance gains that might never see it back upstream (who knows if it's just hardware or software or what behind the scenes here)
- we now have Amazon DocumentDB that is a closed source MongoDB-like scripting interface with Postgres under the hood
- lastly, with this news, looks like Microsoft is now doubling down on the same strategy to build out infrastructure and _possibly_ closed source "forked" wins on top of the beautiful open source world that is Postgres
Please, please, please let's be sure to upstream! I love the cloud but when I go to "snapshot" and "restore" my PG DB I want a little transparency how y'all are doing this. Same with DocumentDB; I'd love an article of how they are using JSONB indices at this supposed scale! Not trying to throw shade; just raising my eyebrows a little.
[+] [-] craigkerstiens|7 years ago|reply
[1]. https://www.citusdata.com/blog/2016/03/24/citus-unforks-goes...
[2]. https://www.citusdata.com/blog/2017/10/25/what-it-means-to-b...
[+] [-] ABeeSea|7 years ago|reply
[+] [-] manigandham|7 years ago|reply
It's similar with Redshift although it's a much older codebase from the v8 branch with more customizations. The changes are very specific to their infrastructure and wouldn't help anyone else since it's not designed as an on-prem deployable product.
There's also no confirmation that DocumentDB runs on Postgres and its most likely a custom interface layer they wrote themselves. If you just want MongoDB on postgres then there are already open source projects that do it.
[+] [-] stingraycharles|7 years ago|reply
[+] [-] SEJeff|7 years ago|reply
https://www.citusdata.com/product/community
[+] [-] ohthehugemanate|7 years ago|reply
It's a differentiator for some of their workloads: you don't have to hand your business over to a black box.
[+] [-] spullara|7 years ago|reply
[+] [-] timClicks|7 years ago|reply
[+] [-] pjmlp|7 years ago|reply
[+] [-] sudhirj|7 years ago|reply
Citus is an extension, not a fork.
So neither of these projects are doing Postgres a dis-service. Both are actually pretty heavily aligned with the continued success and maintenance of mainline open source Postgres.
[+] [-] zjaffee|7 years ago|reply
Virtually all of the companies that were built on open source products in the past few years stopped centering their focus as being the best place to run said open source program, but instead holding back performance and feature improvement as proprietary instead of pushing back upstream.
[+] [-] scarface74|7 years ago|reply
The performance benefits of Aurora over Postgres are mostly because Amazon rewrote the storage engine to run on top of their infrastructure.
[+] [-] illumin8|7 years ago|reply
To clarify, Amazon DocumentDB uses the Aurora storage engine, which is the same proprietary storage engine that is used by Aurora MySQL and Aurora PostgreSQL, and gives you multi-facility durability by writing 6 copies of your data across 3 facilities, with a 4 of 6 quorum before writes are acknowledged back to the client.
So, it's a bit inaccurate to say that DocumentDB has anything to do with Postgres.
[+] [-] cbsmith|7 years ago|reply
[+] [-] Smerity|7 years ago|reply
Given Microsoft's change in operation over recent years there's also hope that they can continue their contributions into the future.
It's fascinating to see Microsoft leave behind the "embrace, extend, extinguish" narrative only to have Amazon adopt it, causing massive rifts and action within the database community[2][3]. I am genuinely concerned about the future of open source software in this continued scenario.
An article with what I considered an outrageous headline ("Is Amazon 'strip mining' open source?"[4]) has only rung more true over time. Amazon is one of the largest companies on earth, selling products that they receive for free but never improve[5], attacking the primary open source provider, and then shift toward their comparable proprietary closed offerings.
Hopefully new ways to "give back", such as equity contribution, can be one of the many paths forward needed to keep open source software healthy. Given how much innovation is unlocked by this, it'd be a crime to go back to the past era.
[1]: https://www.citusdata.com/newsroom/press/citus-data-donates-...
[2]: https://www.cnbc.com/2018/11/30/aws-is-competing-with-its-cu...
[3]: https://techcrunch.com/2019/01/09/aws-gives-open-source-the-...
[4]: https://www.cbronline.com/analysis/aws-managed-kafka
[5]: From [2], "Jay Kreps, a creator of Kafka and co-founder and CEO of Confluent ... said Amazon has not contributed a single line of code to the Apache Kafka open-source software and is not reselling Confluent’s cloud tool."
[+] [-] koolba|7 years ago|reply
[+] [-] craigkerstiens|7 years ago|reply
[+] [-] jarym|7 years ago|reply
They made a decent cloud business model out of it (no idea how successful but everyone I asked was happy with it).
I just hope Microsoft allow the tech to evolve as open source!
[+] [-] iKevinShah|7 years ago|reply
Current Microsoft sure will. They're good with open source stuff.
[+] [-] manigandham|7 years ago|reply
Considering the competitive database landscape, this is a compelling offering to add to any cloud portfolio. Congrats to the Citus team.
[+] [-] skunkworker|7 years ago|reply
[+] [-] pritambarhate|7 years ago|reply
It feels like it was the first. If so, it means bad news for Citus product as it will most likely be ignored for a while. That will be really sad, as I don't know any actively supported automated sharding solution for PgSQL other than Citus. There is PostgresXL[1], but there isn't much focus to make it community friendly.
[1]: https://www.postgres-xl.org/overview/
[+] [-] mjw1007|7 years ago|reply
Half the team would probably wander off to work for one of the other postgres-centered companies (and quite possibly continue to work on the open source Citus code).
[+] [-] scarface74|7 years ago|reply
[+] [-] tosh|7 years ago|reply
[+] [-] tracker1|7 years ago|reply
Similarly, I really appreciate MS-SQL for Linux on Docker as it is a lot easier to setup for CI/CD and local for dev and testing and is nearly transparent going to Azure SQL or MS SQL Enterprise for hosted deployments. I'd much rather use PostgreSQL with PLv8 than MS-SQL though.
[+] [-] AlexB138|7 years ago|reply
[+] [-] peterwwillis|7 years ago|reply
The Googles of the world lose out on professional services, but Microsoft could still make a bundle of money by just consulting on the tools without even managing them. You might even make higher margins by not managing the service.
[+] [-] areohbe|7 years ago|reply
[+] [-] reacharavindh|7 years ago|reply
[+] [-] Dangeranger|7 years ago|reply
Think of them like a wedding event rental company, they are more than happy to rent you their own brand of tables, flatware, and silverware, but if you want another brand that’s fine too as long is you buy from them.
[+] [-] nradov|7 years ago|reply
I expect they'll also try to port Citus Data functionality to the SQL Server platform.
[+] [-] talawahdotnet|7 years ago|reply
[+] [-] barbecue_sauce|7 years ago|reply
[+] [-] gaius|7 years ago|reply
[+] [-] mmsimanga|7 years ago|reply
[+] [-] massaman_yams|7 years ago|reply
see here: https://tech.marksblogg.com/benchmarks.html
[+] [-] manigandham|7 years ago|reply
Also Cstore_fdw is rather obsolete and more of an experiment. It's a rough wrapper around ORC files and is missing many features, advancements and an execution engine to match the performance and usability of a real OLAP database.
[+] [-] olavgg|7 years ago|reply
[+] [-] diminish|7 years ago|reply
[+] [-] simonw|7 years ago|reply
[+] [-] jchristopherinc|7 years ago|reply
[+] [-] jaxn|7 years ago|reply
[+] [-] oarabbus_|7 years ago|reply
[+] [-] Apaec|7 years ago|reply
Shouldn't sustainability be the primary goal instead of making big bucks temporarily?
[+] [-] taormina|7 years ago|reply
[+] [-] sam0x17|7 years ago|reply
[+] [-] yingw787|7 years ago|reply
[+] [-] rockker|7 years ago|reply