top | item 18990469

Microsoft acquires Citus Data (YC S11)

707 points| whatok | 7 years ago |blogs.microsoft.com | reply

187 comments

order
[+] cdbattags|7 years ago|reply
In the latest world of Postgres:

- we now have closed source Amazon Aurora infrastructure that boasts performance gains that might never see it back upstream (who knows if it's just hardware or software or what behind the scenes here)

- we now have Amazon DocumentDB that is a closed source MongoDB-like scripting interface with Postgres under the hood

- lastly, with this news, looks like Microsoft is now doubling down on the same strategy to build out infrastructure and _possibly_ closed source "forked" wins on top of the beautiful open source world that is Postgres

Please, please, please let's be sure to upstream! I love the cloud but when I go to "snapshot" and "restore" my PG DB I want a little transparency how y'all are doing this. Same with DocumentDB; I'd love an article of how they are using JSONB indices at this supposed scale! Not trying to throw shade; just raising my eyebrows a little.

[+] craigkerstiens|7 years ago|reply
Craig here from Citus. We're actually a bit different than past forks. Many years ago Citus itself was a fork, but about 3 years ago we became a pure extension[1]. This means we hook into lower level extension APIs[2] that exist within Postgres and are able to stay current with the latest Postgres versions.

[1]. https://www.citusdata.com/blog/2016/03/24/citus-unforks-goes...

[2]. https://www.citusdata.com/blog/2017/10/25/what-it-means-to-b...

[+] ABeeSea|7 years ago|reply
If the creators of Postgres wanted all improvements to be upstreamed, they wouldn’t have released under a permissive license. The ability to use Postgres commercially without exposing your entire codebase to copyleft risk is one of the reasons it’s used commercially in the first place.
[+] manigandham|7 years ago|reply
Amazon Aurora doesn't have much to do with Postgres and is a custom storage subsystem used by many different database engines. Aurora Postgres is actually using Postgres code on top to handle queries, and eventually PG itself will get pluggable storage engines.

It's similar with Redshift although it's a much older codebase from the v8 branch with more customizations. The changes are very specific to their infrastructure and wouldn't help anyone else since it's not designed as an on-prem deployable product.

There's also no confirmation that DocumentDB runs on Postgres and its most likely a custom interface layer they wrote themselves. If you just want MongoDB on postgres then there are already open source projects that do it.

[+] stingraycharles|7 years ago|reply
CitusData made tons of improvements to upstream postgresql, though. Can’t say that about Amazon.
[+] ohthehugemanate|7 years ago|reply
Kudos to azure for opening so much of what they do. Lots of kubernetes work, including AKS-engine which runs their k8s implementation. Machine learning toolkit. Media services (faceid etc) as a container. The whole azure shabang runs on service fabric, which they've also open sourced.

It's a differentiator for some of their workloads: you don't have to hand your business over to a black box.

[+] spullara|7 years ago|reply
Aurora databases and DocumentDB share the same underlying reliable single-writer, many-reader block device for storage. That is all the magic. Not sure where you got the idea that DocumentDB has Postgres underneath it.
[+] timClicks|7 years ago|reply
I get what you're saying, but BSD-licenses are specifically designed to facilitate things not being sent upstream. I don't understand why people moan about companies complying with the license agreement.
[+] pjmlp|7 years ago|reply
This is what happens in a world devoid of the GPL, or where a large majority doesn't sponsor the work of upstream.
[+] sudhirj|7 years ago|reply
Amazon has explained in their reinvent videos that Aurora is the storage layer of Postgres rewritten to be tightly coupled to their AWS infrastructure. So it is just regular Postgres (they upgrade to latest on a slightly slower cadence). And there’s no benefit to getting the Aurora layer upstream, no one else could use it anyway.

Citus is an extension, not a fork.

So neither of these projects are doing Postgres a dis-service. Both are actually pretty heavily aligned with the continued success and maintenance of mainline open source Postgres.

[+] zjaffee|7 years ago|reply
This is the future and it's not just big companies doing it.

Virtually all of the companies that were built on open source products in the past few years stopped centering their focus as being the best place to run said open source program, but instead holding back performance and feature improvement as proprietary instead of pushing back upstream.

[+] scarface74|7 years ago|reply
we now have closed source Amazon Aurora infrastructure that boasts performance gains that might never see it back upstream (who knows if it's just hardware or software or what behind the scenes here)

The performance benefits of Aurora over Postgres are mostly because Amazon rewrote the storage engine to run on top of their infrastructure.

[+] illumin8|7 years ago|reply
> - we now have Amazon DocumentDB that is a closed source MongoDB-like scripting interface with Postgres under the hood

To clarify, Amazon DocumentDB uses the Aurora storage engine, which is the same proprietary storage engine that is used by Aurora MySQL and Aurora PostgreSQL, and gives you multi-facility durability by writing 6 copies of your data across 3 facilities, with a 4 of 6 quorum before writes are acknowledged back to the client.

So, it's a bit inaccurate to say that DocumentDB has anything to do with Postgres.

[+] cbsmith|7 years ago|reply
I would argue Microsoft's strategy actually makes them more wedded and committed to ensuring the vitality of open source PostgreSQL than anything AWS is doing.
[+] Smerity|7 years ago|reply
The big news here: Citus Data donated 1% of their equity to non-profit PostgreSQL organizations[1] so this acquisition is a win for the community even in the darkest scenario of Citus Data disappearing into a canyon on the Microsoft campus.

Given Microsoft's change in operation over recent years there's also hope that they can continue their contributions into the future.

It's fascinating to see Microsoft leave behind the "embrace, extend, extinguish" narrative only to have Amazon adopt it, causing massive rifts and action within the database community[2][3]. I am genuinely concerned about the future of open source software in this continued scenario.

An article with what I considered an outrageous headline ("Is Amazon 'strip mining' open source?"[4]) has only rung more true over time. Amazon is one of the largest companies on earth, selling products that they receive for free but never improve[5], attacking the primary open source provider, and then shift toward their comparable proprietary closed offerings.

Hopefully new ways to "give back", such as equity contribution, can be one of the many paths forward needed to keep open source software healthy. Given how much innovation is unlocked by this, it'd be a crime to go back to the past era.

[1]: https://www.citusdata.com/newsroom/press/citus-data-donates-...

[2]: https://www.cnbc.com/2018/11/30/aws-is-competing-with-its-cu...

[3]: https://techcrunch.com/2019/01/09/aws-gives-open-source-the-...

[4]: https://www.cbronline.com/analysis/aws-managed-kafka

[5]: From [2], "Jay Kreps, a creator of Kafka and co-founder and CEO of Confluent ... said Amazon has not contributed a single line of code to the Apache Kafka open-source software and is not reselling Confluent’s cloud tool."

[+] koolba|7 years ago|reply
Any clue what the base for that 1% is going to be? Didn’t see any mention of the total acquisition amount anywhere.
[+] jarym|7 years ago|reply
Well this is great news for the guys at Citus - they created something great as a Postgres add-on and a big chunk of it was open sourced.

They made a decent cloud business model out of it (no idea how successful but everyone I asked was happy with it).

I just hope Microsoft allow the tech to evolve as open source!

[+] iKevinShah|7 years ago|reply
"I just hope Microsoft allow the tech to evolve as open source!"

Current Microsoft sure will. They're good with open source stuff.

[+] manigandham|7 years ago|reply
Citus is already used by Microsoft itself internally, a recent example being the VeniceDB project to analyze Windows telemetry: https://www.youtube.com/watch?v=AeMaBwd90SI

Considering the competitive database landscape, this is a compelling offering to add to any cloud portfolio. Congrats to the Citus team.

[+] skunkworker|7 years ago|reply
I still can't get over the fact that Microsoft is using Postgres internally, if you had told me that 5 years ago I wouldn't have believed it. Did they go into why over MSSQL?
[+] pritambarhate|7 years ago|reply
The main question is: Did MS want an expert PgSQL team to work on Azure PostgreSQL (and may to create a proprietary competitor to Aurora)? Or Did they acquire Citus for its product, to improve and market it further?

It feels like it was the first. If so, it means bad news for Citus product as it will most likely be ignored for a while. That will be really sad, as I don't know any actively supported automated sharding solution for PgSQL other than Citus. There is PostgresXL[1], but there isn't much focus to make it community friendly.

[1]: https://www.postgres-xl.org/overview/

[+] mjw1007|7 years ago|reply
I don't think anyone should expect acquihiring an expert Postgres team to work on a proprietary product to work well, because the programmers' skills are eminently transferrable.

Half the team would probably wander off to work for one of the other postgres-centered companies (and quite possibly continue to work on the open source Citus code).

[+] scarface74|7 years ago|reply
This is more of a competitor to Redshift than Aurora.
[+] tosh|7 years ago|reply
Great news for Citus, Microsoft, Postgres and for people using open source relational databases. This makes so much sense. (I know this comment might read naive to some but I’m genuinely excited right now)
[+] tracker1|7 years ago|reply
I'm pretty excited as well... Especially if this means improvements to Azure's PostgreSQL options. DBaaS is one of the areas where cloud providers give a LOT of value, more so as long as the interfaces you use can be used internally/locally for development.

Similarly, I really appreciate MS-SQL for Linux on Docker as it is a lot easier to setup for CI/CD and local for dev and testing and is nearly transparent going to Azure SQL or MS SQL Enterprise for hosted deployments. I'd much rather use PostgreSQL with PLv8 than MS-SQL though.

[+] AlexB138|7 years ago|reply
I wonder how long it will be before they shutdown their own Citus Cloud hosted offering, which is hosted on AWS. Seems obvious that will become part of Azure soon.
[+] peterwwillis|7 years ago|reply
It's going to be really funny if Microsoft ends up using Open Source software to compete against its proprietary service-based competitors. Sort of like how GCP runs k8s... you can use the free tool, or you can use the managed service, and the community helps build the thing. In theory, you retain competitive advantage because you have the most expertise in the product.

The Googles of the world lose out on professional services, but Microsoft could still make a bundle of money by just consulting on the tools without even managing them. You might even make higher margins by not managing the service.

[+] areohbe|7 years ago|reply
Congrats Citus team. Just please keep the blog alive! Craig's post are some of my favorite Postgres reads.
[+] reacharavindh|7 years ago|reply
So, a part of Microsoft will advocate for SQL server, and another part will develop for PostgreSQL? Isn't it weird? Why would Microsoft want this?
[+] Dangeranger|7 years ago|reply
Because MS is more and more in the business of selling the operation of software as a service, instead of selling licenses for their customers to operate themselves.

Think of them like a wedding event rental company, they are more than happy to rent you their own brand of tables, flatware, and silverware, but if you want another brand that’s fine too as long is you buy from them.

[+] nradov|7 years ago|reply
Microsoft is hedging their bets. PostgreSQL has the potential to disrupt the traditional relational database market, so if they're going to be disrupted then better to do it themselves.

I expect they'll also try to port Citus Data functionality to the SQL Server platform.

[+] talawahdotnet|7 years ago|reply
A little off topic, but I wonder how long it will be before MS acquires Docker Inc. Seems like an even better fit for them now that they own GitHub. GitHub + Docker Hub on the developer engagement side and Docker Enterprise on the traditional enterprise side.
[+] barbecue_sauce|7 years ago|reply
I'm wondering how much the OCI and CRI-O has impacted Docker's value proposition. Docker Hub seems more and more like the real product, though I guess you could argue that the container runtime was never really a product in the first place.
[+] gaius|7 years ago|reply
Docker the company is a lame duck, and docker the software is being rapidly supplanted by podman and buildah. There would be no point.
[+] mmsimanga|7 years ago|reply
I haven't used Citus but once thought about Cstore_fdw. How much of this is about Cstore_fdw? I am curious because in data warehousing space my experience has been column store databases totally rule when it comes to speed on analytics. I know SQL Server has column store indexes but that requires you to create them whereas with genuine column store you get the performance boost by virtue of how data is stored.
[+] manigandham|7 years ago|reply
SQL Server indexes can be either clustered or non-clustered, which determines whether table data is stored by index order. If you have a clustered columnstore index then the table is actually physically stored in a column-oriented format. Combined with vectorized processing, an impressive query optimizer, and in-memory tables, MSSQL is one of the fastest OLAP systems available.

Also Cstore_fdw is rather obsolete and more of an experiment. It's a rough wrapper around ORC files and is missing many features, advancements and an execution engine to match the performance and usability of a real OLAP database.

[+] olavgg|7 years ago|reply
For data analytics I use ClickHouse instead of PostgreSQL. There is a PostgreSQL Foreign Data Wrapper (FDW) for the ClickHouse database, but I have never used it.
[+] diminish|7 years ago|reply
Does anyone know any details about the financials of the deal? Is this an acquihire or more?
[+] simonw|7 years ago|reply
There's no way this was just an acquire: Citus represents some truly impressive computer science.
[+] jchristopherinc|7 years ago|reply
Happy for the folks at Citus. I use Citus at work and it's amazing. Hope things stays the same after the acquisition.
[+] jaxn|7 years ago|reply
My sentiments as well. Great team to work with and really like the product.
[+] oarabbus_|7 years ago|reply
Speaking as a data professional and SQL addict, I was always impressed when I came across Citus Data posts. Good acquisition by Microsoft.
[+] Apaec|7 years ago|reply
Why is an acquisition a win for the company? Seems to me like the big company is killing the small one and absorbing its soul(brand).

Shouldn't sustainability be the primary goal instead of making big bucks temporarily?

[+] sam0x17|7 years ago|reply
Maybe now they will actually add a free tier so people can sign up for this, develop their product using a free tier, and upgrade when they launch, as is the natural progression with most other cloud products. I think before there were some complexity and/or financial issues preventing this but with Microsoft's wallet it shouldn't be an issue.
[+] yingw787|7 years ago|reply
Congratulations to the Citus Data team! I don't have anything significant to add, but I loved the free socks you gave out :)
[+] rockker|7 years ago|reply
Wonder if they will have Microsoft socks now :)