top | item 1538020

Unhappy Programmer: Whining about Thrift and Cassandra

8 points| cassie | 15 years ago | reply

Earlier this year I embarked on a project where we had the opportunity to experiment with a lot of new things. We had been given a lot of freedoms that most projects do not enjoy.

The first technology we looked at was Google Protocol Buffers for representing data. Protobuffers are done really well and they are easy to use in a multitude of languages we care about. The problem is that there is one critical piece missing: Google never bothered releasing their RPC mechanism, and word around the campfire is that it is never going to happen due to the spaghetti of deep interdependencies this RPC layer has with lots of other stuff in the google codebase. Also, no clear open source project has filled the spot in any convincing manner.

So we dropped protobuffers and had a look at Thrift. On the surface of things Thrift fits the bill. Thrift isn't quite as nice as protobuffers but it is close. More importantly, it comes with an RPC mechanism. Or shall we say, it comes warious RPC mechanism options.

Okay, we have an RPC layer. On to the storage system.

We had a look at CouchDB, Voldemort, Cassandra and Hbase and decided to give Cassandra a chance.

Now Cassandra has been getting a lot of press lately and it is indeed a relatively sweet system. There are a few problems though.

First off, would it have killed the designers to use nomenclature that makes sense to people? Call a table a table. Call a row a row. And who the hell figured it would be a good idea to refer to tuples and maps as various types of columns? Most people have certain expectations of what a "column" is and it ain't what the Cassandra designers think.

The reason why you have types such as Set, Map and List in Java is because these have defined meanings in mathematics. The designers didn't just assign new meanings to words that already had meaning.

Second, neither Thrift nor Cassandra are available through any official Maven repositories yet even though it has been forever since they were released as open source. That right there is a big warning sign.

It means that people need to fiddle around embedding dependencies in their projects. Even Cassandra is using a older version of Thrift, which it has to embed in the build -- so if you were thinking of using a relatively new version of Thrift in your Cassandra-backed Thrift service you have to think again (or go through the pain of making two versions of the same library play nice within the same JVM).

Third, some of the attitudes really stink.

While reading through the mail archives I came across an issue that I myself experienced. I had a thrift service running and all of a sudden it crashed with an OutOfMemoryError. I tracked down the bug to the framed transport implementation and to my horror discovered that it is a fairly naive implementation: it'll just read a sequence of bytes off the wire, interpret them as a number and then try to allocate a buffer of that size. There was not even a comment in the code that this is relatively poor design and that it might be an idea to implement a chunked framed transport (so you can move large objects while still discovering frame errors without committing lots of resources). But I digress. One of the responses I found in the mailing archives amounted to "don't do that then. thrift is expected to run in a trusted environment". Huh!? Have these people even worked for an Internet company before? There's a lot of stuff going on in a datacenter and you CANNOT have a critical system go down just because some program erroneously connects to the wrong port.

Fourth, while trying to develop for Cassandra I needed to implement proper unit tests. This proved to be amazingly fiddly since embedding Cassandra is a sheer nightmare. I looked at how one of the client library designers had done it and ... I got a bit sad. Not an elegant solution. The short version is: if there is any chance you'll be running more than one instance of Cassandra at the same time in the same JVM you are fucked. I guess someone didn't get the memo on how to design singletons properly.

Fifth, I am really amazed by the fact that Facebook just threw Thrift and Cassandra over the fence and then never bothered to make sure that things progressed to a usable state in a timely manner. Right now, wide adoption of Cassandra and Thrift is gridlocked. I see people play with it for a while and then ditch it. Why did Facebook open source it in the first place if they never intended to drive the project forward? (Same goes for Google, why on earth did they just dump Protobuffers into open source and then largely abandon it?)

As mentioned earlier, it has been forever since Thrift was open sourced, and the thing is still not available through any official maven repositories. Which means that EVERY downstream project is affected. Including Cassandra.

I didn't want to write this piece. What I wanted was to sit down, learn the codebase thoroughly and see where I could contribute to pushing these hings to where they should already be (even though I've been told "well, good luck with getting any patches accepted"). I went to my managers and said that I would like to take a quarter out of our current project to make Cassandra and Thrift usable for everyone. Unfortunately we do not have the budget or the time for that.

I think Cassandra has a huge potential, but that it is slowly being wasted. Not enough people care about things that really matter to developers and unless this is recognized and addressed I think Cassandra is going to have a really bleak future.

Yes, this is a whiny piece, but I am at my wits end.

1 comment

order
[+] isnoteasy|15 years ago|reply
Well, you say you don't have the time or the budget to contribute to Cassandra and Thrift now. Don't despair, perhaps in the future you could contribute to those projects.