RethinkDB: An open-source distributed database built with love over three years

[+] mmorearty|13 years ago|reply

I don't know much about RethinkDB yet, but I will say that I have been a big fan (online) of one of its founders, Slava Akhmechet, for years. I've never met him, but he wrote some terrific articles on his website, http://www.defmacro.org/ , a few years ago. Start at the bottom of the list of articles, with "The Nature of Lisp."

Slava is a deep thinker, which makes me very excited to take a look at RethinkDB.

[+] aberman|13 years ago|reply

I wish I could up-vote this a thousand times. Slava is one of the most genuine founders I've met, and I wish him and RethinkDB all the best.

[+] jgw|13 years ago|reply

Indeed - he mentions in the article that he set himself a goal to convert 10 programmers into Lispers. Sounds like he probably has that many just in this thread! Kudos, sir!

[+] reinhardt|13 years ago|reply

So is RethinkDB written in Lisp?

[+] dxbydt|13 years ago|reply

Thought I'll share this with you.

A yc company hired me. I showed up at their mountain view office. The founder said "This is the former office of RethinkDB! I hope we are as successful as them."

I didn't know who/what RethinkDB was, so I said ok, sure.

3 days later he asked me to clear my desk and leave. He said "You are the sort of person who should work in RethinkDB".

So I asked "What does that mean ?"

He said "RethinkDB is trying to solve very deep algorithm problems. They want somebody with CS knowledge to do deep research. That is what you are good at. But here we are just trying to run a business. You are not a good fit for that!".

So I left.

[+] fragsworth|13 years ago|reply

I know lots of engineers who have trouble talking to people who don't share their knowledge. This problem is extremely pervasive - I'd say a good 25% or more have this problem to some extent. It's not a good thing when this happens - you need to be able to speak to laymen or you're gonna have a bad time.

I am going to go out on a limb here and suggest you try to work on being a bit more practical. Don't complicate things for the sake of solving difficult problems. Don't try to shower people with your engineering knowledge when it's not necessary, and don't expect everyone to know everything you do. And don't be an asshole about it either.

[+] flogic|13 years ago|reply

Wow that's special. Sounds to me like they're tools. You can't really expect an employee to know they're way around the code base after only 3 days. Hell most places you find yourself sitting on your thumbs the first week due to everyone being too busy to spend much time orienting you.

[+] jedberg|13 years ago|reply

Suggestion: It would be great to have a page on your website that explains why RethinkDB is better than the other prevailing options. Right now I don't know why I'd want to invest time setting up yet another database.

[+] coffeemug|13 years ago|reply

Thanks -- will do in the next few days.

[+] coffeemug|13 years ago|reply

Hey guys, Slava here. I've been up since yesterday, so I'm going to clock out (though some of the team members are still lurking here). I wanted to thank everyone for great feedback. We're working hard to improve Rethink over the next few months. FYI, you can always hop on IRC (#rethinkdb on freenode) or github tracker (https://github.com/rethinkdb/rethinkdb/issues) with questions and we'll help you out.

[+] biturd|13 years ago|reply

Thanks for this work, it looks really nice.

I was looking at the github comments about a home brew recipe in which it was stated that aside from a recipe creating a VM, the Mac OS X port would take a bit longer.

Is that a full port from one language to another? Or just an issue of the different flavors of *nix that need dealing with and probably some of the dependency tree issues that come with it?

I'm curious what needs be done to get it building on Mac OS X — perhaps I could assist somehow.

I see a few dependencies that don't immediately sound familiar. You may have better luck with MacPorts, which uses tcl as the language for their portfiles.

Portfiles are just like homebrews recipes, but MacPorts always builds new, including the entire dependency tree ( and dependencies of dependencies etc., etc. ), for which they have thousands of working portfiles. Since those are completed and working, you wouldn't have to worry about those until you wanted to be able to make a binary outside of any package manager.

MacPorts can build binaries now ( new feature ), so you could just as easily instruct it to create a standard Mac OS X installer .pkg which makes sure everything goes in the right place, on the right platform, for the right architecture.

They are an exceedingly friendly and helpful group, I'm sure they would live to see this software in their package/portfiles list.

[+] tedjdziuba|13 years ago|reply

[deleted]

[+] codewright|13 years ago|reply

I'm hoping this'll be a viable replacement for MongoDB. (Sparse/Schema-free is incredibly useful for me, as is JSON-centric modeling)

jedberg already asked for a compare/contrast, but let me provide some specifics I care about that you might be able to answer.

1. Is it fair to say that thanks to MVCC, running an aggregation or map-reduce job isn't going to lock the whole damn thing up like it does on MongoDB?

2. You've got a distributed system that is seemingly CP, do the availability/consistency semantics compare with HBase? Master-slave? Replication? Sharding?

3. Latency is a big one for us and is a large part of why we use ElasticSearch. How does the read-latency on RethinkDB compare with Mongo/MySQL/Redis/et al ?

[+] coffeemug|13 years ago|reply

1. Yes -- that was the main motivation for MVCC. We wanted to allow people to use rethinkdb for analytics and map/reduce on top of the realtime system without dealing with having to replicate data into something else.

2. Short answer: we favor consistency (via master/slave under the hood). It allows for much easier API, much fewer issues in production, etc. The user experience is just better. If you're ok with out of date results, you can do that too without paying the price of consistency guarantees. The downsite of our design is that you might lose write availability in case of netsplits (if the client is on the wrong side of the split). Longer answer: checkout the FAQ at http://www.rethinkdb.com/docs/advanced-faq/

3. Read latency should be equivalent to other comparable master/slave systems. We don't do quorums, so latency will be much better than quorum/dynamo-based designs.

[+] jbellis|13 years ago|reply

I'll ask the obvious question not in the FAQ: How is this different from MongoDB?

[+] coffeemug|13 years ago|reply

Hey, this is Slava, founder of rethinkdb. There are some obvious high level differences:

* A far more advanced query language -- distributed joins, subqueries, etc. -- almost anything you can do in SQL you can do in RethinkDB

* MVCC -- which means you can run analytics on your realtime system without locking up

* All queries are fully parallelized -- the compiler takes the query, breaks it up, distributes it, runs it in parallel, and gives you the results

But beyond that, details matter. Database system differ on what they make easy, not what they make possible. We spent an enormous amount of time on building the low-level architecture and working on a seamless user experience. If you play with the product, I think you'll see these differences right away.

Note: rethink is a new product, so it'll inevitably have quirks. We'll fix all the bugs as quickly as we can, but it'll take a few months to iron things out that didn't come up in testing.

[+] lacker|13 years ago|reply

One apparent difference between RethinkDB and MongoDB is that in RethinkDB, you can only index on the primary key. I imagine secondary indexes will be coming along soon.

[+] unknown|13 years ago|reply

[deleted]

[+] sutro|13 years ago|reply

How does RethinkDB perform when compared to open-source distributed databases built with hate?

[+] bsg75|13 years ago|reply

Maybe not built with hate, but used in anger?

[+] ww520|13 years ago|reply

Congratulate on releasing. Well done!

A few questions:

1. Will secondary indices be ever supported? Range scan with a different order than the primary key is very welcomed. E.g. date range query.

2. Do you support conditional update? Or any kind of optimistic locking or versioning to coordinate concurrent updates from different clients?

3. Related to 2. How can loosely-sequential Id be generated using a table?

4. Will some transaction support be added? Don't need full ACID, just grouping updates (intra-table and/or inter-tables) in one shot would be nice. Should be feasible with MVCC already in place.

5. Do all the clients hit a central server to initiate queries which then farms out the requests to different shards? Or the client library knows how to get to different shards directly? First case has a single-point-of-failure, and bottleneck in scaling.

6. Do you support automatically re-balancing of shard data (data migration) when new shards are added or old ones retired?

7. How are authentication and authorization done? Or any clients can come in?

8. Internal detail. For out-of-date distributed query on the slave replicas, is there a cost-based (or load-based) decision process to pick the most idle replica to do the sub-query?

9. Internal detail. Do you use Bloom Filter to optimize distributed joins?

[+] jdoliner|13 years ago|reply

> 1. Will secondary indices be ever supported? Range scan with a different order than the primary key is very welcomed. E.g. date range query.

Secondary indices are one of the most asked for features so they'll probably be added in the next release. No promises though secondary indices are tough to do right and we won't ship them if they're not great.

> 2. Do you support conditional update? Or any kind of optimistic locking or versioning to coordinate concurrent updates from different clients?

Updates can be done with conditions on the row. For example: table.filter(lambda x: x['age'] > 25).update(lambda x: {"salary" : x["salary"] + 25)

> 3. Related to 2. How can loosely-sequential Id be generated using a table?

Loosely-sequential IDs would have to be generated client side for now.

> 4. Will some transaction support be added? Don't need full ACID, just grouping updates (intra-table and/or inter-tables) in one shot would be nice. Should be feasible with MVCC already in place.

Eventually. No concrete timeline for this right now though.

> 5. Do all the clients hit a central server to initiate queries which then farms out the requests to different shards? Or the client library knows how to get to different shards directly? First case has a single-point-of-failure, and bottleneck in scaling.

A client makes a connection to a specific server and all queries go through that server. However every server can file this role so connections can be distributed and there's no single point of failure. An even better option is to run a proxy on the same machine as the client. For more info run:

rethinkdb --help proxy

> 6. Do you support automatically re-balancing of shard data (data migration) when new shards are added or old ones retired?

Right now sharding is a manual process. You tell the server how many shards you want and it handles figuring out how to evenly split the data, picking machines to host them and getting the data where it needs to go. What it doesn't do is readjust the split points when the data distribution changes. This will be a feature in RethinkDB 1.3.

> 7. How are authentication and authorization done? Or any clients can come in?

RethinkDB has no authentication built in to it. You should not allow people you don't trust to have access to it.

8. Internal detail. For out-of-date distributed query on the slave replicas, is there a cost-based (or load-based) decision process to pick the most idle replica to do the sub-query?

Right now we just select randomly. This is slated as a potential upgrade for 1.3. Especially if it proves to be a problem for people. Thus far it hasn't been for us in profiling runs but this is the type of problem that's more likely to show up in real world workloads.

9. Internal detail. Do you use Bloom Filter to optimize distributed joins?

We do not currently use bloom filters to optimize this.

[+] coffeemug|13 years ago|reply

1. Yes. It's a matter of doing this right, which will take some time.

2. Yes. There is no special command, you just combine update and branch (http://www.rethinkdb.com/api/#py:control_structures-branch) Here's an example in Python:

  r.table('foo').get(5).update({ 'bar': r.branch(r['baz'] == 0, 1, 2)})

This will set attribute bar to 1 if baz is 0, or to two 2 otherwise. Everything is atomic on that document.

3. Currently the server doesn't support a sequential (or even loosely sequential) id autogeneration. You'd have to do that on the clients, but using a timestamp for example.

4. I don't know yet how to do this really efficiently. It's relatively easy to do on a single shard, but cross-shard boundaries make this really hard.

5. Any client can connect to any server. The server will then parse and route the query. There is no central server, everything is peer-to-peer. The client library doesn't know about multiple servers now, so responsibility is on the user to hit a random server. Alternatively you can run "rethinkdb proxy" on localhost and connect the client to that. The proxy will then route queries to proper nodes in the cluster.

6. In the web UI, if you click on the table and reshard, everything will be rebalanced. You don't even have to add or remove shards, it'll just rebalance data for the number of shards you have. The UI has a bar graph with shard distribution, so you can see how balanced things are.

7. Currently there is no authentication support - we expect users to use proper firewall/ssh tunneling precautions.

8. Yes, that's how queries get routed. Currently this isn't very smart, but it will get much better over time. If something breaks for you performance-wise, just reach out and we'll fix it.

9. No, not yet. If you run eq_join on a small subset of the data (99% of OLTP workloads) it will be very fast. Other joins work ok, but there's A LOT of room for optimization.

Phew!

[+] DanielRibeiro|13 years ago|reply

Did they not launch a while ago: http://techcrunch.com/2011/06/06/rethinkdb-expands-beyond-ss... ?

[+] continuations|13 years ago|reply

* In the previous incarnation of rethinkdb the focus was on maximizing performance on SSDs. Is this still the case - does rethinkDB perform better than other databases on SSDs? Do you have any benchmark numbers?

* How does rethinkdb compare to MySQL Cluster? Both are distributed, replicated databases with a sql-like query language.

* Any plan to offer a java client?

[+] coffeemug|13 years ago|reply

* The SSD-optimized storage engine is running under the clustering engine. I'm wary of saying 'better' or 'worse' in case of benchmarks, because they're really tricky to do right. We'll be publishing well-researched benchmarks as soon as we can, but it will take time.

* RethinkDB has flexible schemas and a query language that integrates straight into the host programming language and doesn't require string interpolation. As far as clustering goes, RethinkDB is a) really really really easy to use, and b) does a lot of query parallelization and distribution that MySQL cluster doesn't do. The product feels totally different, I think in a good way. The downside, of course, is that rethink is new and it will take some time to work out all the kinks.

* I can't commit to a timeline yet, but yes, absolutely.

[+] erichocean|13 years ago|reply

I find JSON-oriented databases to be a huge limitation for writing applications managing any kind of financial data, due to the lack of a decimal number type and a timestamp/date type, both of which SQL provides (and are used A LOT).

Sure, you can put that stuff in strings, but then you'll run into limitation with queries where you want to, e.g., aggregate a total, or do timestamp arithmetic.

I could do everything with strings, custom map-reduce, etc., if you're inclined to suggest that as a workaround. Still doesn't mean JSON's a good idea.

[+] alexpopescu|13 years ago|reply

We thought of supporting data types that are not part of JSON (date/time/timestamp/deltas, etc.), but we wanted to take the time to do it right so these didn't make it to this version.

alex @ rethinkdb

[+] erichocean|13 years ago|reply

The other thing that bothers me about all these new JSON databases is they aren't really novel anymore.

Clustered databases are essentially a solved problem, and have been for years. What's needed today are databases solving the problem that Google Spanner addresses – global consistency across distributed clusters in separate data centers. If you want a challenge in the DB world, that's where it is.

But another clustered, schema-less JSON database? Might as well open up Intro to Algorithms and run through the exercises -- it's no longer a challenge, algorithmically or otherwise.

Sorry to be a downer on this, and it does still take a strong coder to implement one, so well done on that front. :)

[+] oh_sigh|13 years ago|reply

Just store it in a string. Or a special map structure: {"date": "2012-11-01", "format":"YYYY-MM-DD" }

[+] cuu508|13 years ago|reply

Or you could use fixed point math: 600 in database means $6.00 Then aggregates and comparison operators would work, but you would have to decide upfront how much precision you might ever need

[+] szopa|13 years ago|reply

Nice work! It seems that you are well aware of the tradeoffs that you are taking and communicating it openly in your documentation (and your choices seem to be very reasonable). I really like the tone of your communication – it seems essentially BS/koolaid free.

1. How much data can you put in one instance before seeing performance degradation? I know that you still working on good benchmarks – but do you have any ballpark figures?

2. How does replication work? Is it closer to row/document or statement based (or something completely different)? How fast is the replication?

3. What is your envisioned used of the replication? Are replicas supposed to serve read traffic, or their goal is to keep the data safe in case of a catastrophe?

4. Can you tell me something more about cluster configuration propagation? The Advanced FAQ answer doesn't get into much detail.

5. Am I correct to assume that you are using protocol buffers? What motivated your choice?

[+] pc|13 years ago|reply

This took tenacity. Congrats on shipping.

[+] jamesli|13 years ago|reply

Great work! One question: is there any manual that explains the implementation details of the internals? Some manual similar to those Oracle, MySQL, Postgres, etc. provide?

The only docs I found in the company website that goes deep into the internals are Advanced FAQ (http://www.rethinkdb.com/docs/advanced-faq/). It is more of an architecture view, though.

The reason I ask is that with a good understanding on the internals, the engineers who understand database internals and distributed systems will have an "more" accurate idea on the capabilities and the limits of the features. Thus, if they decide to adopt RethinkDB, the understanding will help them design their applications to take advantages of the benefits and avoid the potential issues (or surprises!). MongoDB was not very good at documentation. It claims this or that feature works smoothly. Then, people found out many potential issues and limitations. That is one reason it leaves a bad tastes to many engineers.

[+] coffeemug|13 years ago|reply

There currently isn't, beyond the advanced FAQ. This isn't by design -- writing really good detailed architecture papers takes a lot of time, and we were 100% focused on getting the product out. We'll get much better at documenting the internals, but it will take some time.

[+] harryh|13 years ago|reply

If I were you guys I'd strongly consider adding support for hashing of the shard key. There are many cases where you care about distributing your writes(1) a lot more than fast range queries on the PK.

-harryh

1. Yes, I know there are other ways to do this besides hashing the shard key, but this is often the best way.

[+] tjic|13 years ago|reply

What the heck does "built with love" even mean?

Is this just a hipster marketing term to tell us that it's small and cute and made by people who play ukuleles and ride unicycles in their spare time, and not by evil corporate people who commute to work and have mortgages?

I find a lot of advertising eyeroll inducing, and the current trend of more-hipster-than-thou posturing is right at the top.

[+] gruseom|13 years ago|reply

You could not have gotten these guys more wrong. They are serious technologists who have been working day and night for years to build something that they deeply believe in. Every hacker's heart should be warmed by the fact that they kept at it.

When you have a vision of something great that ought to exist and set about bringing it into the world, you are in an isolated position: other people don't yet see what you see. This leads to a lot of doubt by others and by yourself too. The longer it takes, the more exposed you are. To make it through that you are going to need a deeper source of motivation – an underground spring. Love is a fine word for this, and it makes me happy that Slava put it in his title: it's a clue to this experience that rarely gets mentioned, especially in the land of pivots and MVPs and weekend hacks.

[+] jacquesm|13 years ago|reply

Releasing a project like this means working very hard for a long time without anybody patting you on the back saying you're doing a good job. It's exhausting, a labour of love.

Get it now?

[+] VeejayRampay|13 years ago|reply

You look like an angry person man, chill the fuck out.

Rule of thumb is if you build something this nice and with that order of magnitude in complexity you can put My Little Poney stickers on your homepage and still get respect. Who cares about the "attitude" and the "language" for Christ's sake, they BUILT stuff with their own hands and are offering it to the world, they can do whatever they damn please.

[+] benatkin|13 years ago|reply

Since it's under the AGPL it will mostly be built by people that have been vetted. By switching to this people are one step closer to having a machine that boots without assholes. http://rusty.ozlabs.org/?p=196

[+] ketralnis|13 years ago|reply

What does hipster even mean here?

[+] amazedsaint|13 years ago|reply

Made with love = Made not just for money, but Made for making the world a better place.

[+] shykes|13 years ago|reply

I am very excited about this. The RethinkDB team is rock-solid and the market is only going to get bigger.

I particularly like the perspective of an easy onramp to get started, knowing that I will never have to leave because of scale or reliability.

Please, please give me a SQL adapter! My marketing team needs SQL. My business app developers need SQL. Give them an adapter and I will get them to use RethinkDB - knowing that 1) my data is safe and I'm not 6 months away from a painful re-architecture and migration, and 2) as my developers hit the limits of SQL they can gradually (gradually!) peel the paint off and start using your more powerful query language.

[+] haberman|13 years ago|reply

Is schemaless a win over an object schema like a JSON schema (or a Protocol Buffer .proto file)?

Schemaless is clearly a convenience win over SQL because SQL's way of modeling nested/repeated data doesn't map as easily onto programming languages. But for all the people who are using JSON-based databases these days, I'm curious how many of them couldn't easily write a JSON schema or a .proto file that describes their de facto schema.

I ask because a lot of things become easier to reason about (and optimize) if you know that a field won't be a string in one record and a number in another. And writing a .proto file (or equivalent JSON schema) would give you an authoritative place to document what all the fields actually mean.

I don't have any actual experience with JSON-based databases, so I was interested to hear the opinions of people who do.

[+] coffeemug|13 years ago|reply

There is of course no fundamental reason why JSON-based db's has to be schemaless. This is one interesting direction that might be worth exploring.

[+] embwbam|13 years ago|reply

this. I don't like SQL columns. They make life hard. But I'm spending time learning TypeScript specifically so I can add some types/schemas to my JavaScript.

That doesn't mean I want to deal with the implementation detail of columns, but I definitely wouldn't mind some type safety.

[+] m0th87|13 years ago|reply

How do filters work? They seem pretty difficult implementation-wise since you can write them in any of the language bindings. My first guess is that you pipe all the data in a table to the client, and the client itself does the filtration. But this would be extraordinarily inefficient.

[+] ch0wn|13 years ago|reply

This looks really interesting. I'm interested to see how their license choice works out. The server is AGPL-licensed while the drivers are under Apache 2.0. This should at least avoid the issues we all know from libmysqlclient.

[+] jedahan|13 years ago|reply

Last I heard RethinkDB was a tail-append style engine for MySQL that was optimized for SSDs. Interesting to see a drastic pivot like this. Looks good, and good luck.

237 comments