RethinkDB 2.0 is now production ready

[+] richardwigley|11 years ago|reply

Is Rethink going to stay in the community? Or is there a chance that it could be bought out? I don't want to spend time learning something and have it go private like FoundationDB. I'm assuming GNU and Apache is a good thing?

How is RethinkDB licensed?

The RethinkDB server is licensed under the GNU Affero General Public License v3.0. The client drivers are licensed under the Apache License v2.0. http://rethinkdb.com/faq/

[+] coffeemug|11 years ago|reply

Slava, CEO @ Rethink here. There are two aspects that you should consider.

Firstly, as Daniel pointed out, RethinkDB is licensed under AGPL. An acquirer wouldn't have the legal means to close the source code, and with over 700 forks on GitHub they also couldn't do it practically.

But beyond licensing, consider our personal motivations. We've been working on RethinkDB for five years, and had quite a few opportunities to sell the company. We turned them all down because we really believe in the product. The world is clearly moving towards realtime apps, and we feel it's extremely important for open realtime infrastructure to exist. It's easy for people to make promises about the future, but consider this from a game-theoretic point of view. If we wanted to sell, we could have done it long ago. I know it's not a guarantee, but hopefully it's a strong signal to help with your decision.

(Also, there are lots of really interesting companies building products on RethinkDB that we can't talk publicly about yet. It would be silly to sell given that momentum)

[+] ifcologne|11 years ago|reply

FoundationDB was a closed-source database. It was never open-source.

They had only open-sourced the SQL-Layer on top of their key/value store and it's still available on Github. The reason: They build it based on open-source code

When someone deletes a public repository on Github, one fork remains as the new master. (Here's FoundationDB's SQL-Layer: https://github.com/louisrli/sql-layer)

So: RethinkDB will stay, even if someone tries to pull the plug. Just fork them on Github. :)

[+] danielmewes|11 years ago|reply

Daniel @ RethinkDB here. As you mention, RethinkDB is fully open source so RethinkDB is always going to remain freely available.

[+] notdonspaulding|11 years ago|reply

Cool!

I've started to look into RethinkDB in the past, and I'm very interested in the features it claims. However, I only have so much time to investigate new primary storage solutions, and our team has been burned in the past by jumping too quickly on a DB's bandwagon when the reliability, performance, or tooling just wasn't there.

As of late, we've come to rely on Aphyr's wonderful Call Me Maybe series[0] as a guide for which of a DB's claims are to be trusted and which aren't. But even when Aphyr hasn't tested a particular DB himself, some projects choose to use his tool Jepsen to verify their own claims. According to at least 1 RethinkDB issue on Github, RethinkDB still hasn't done that[1].

Not to poo poo on the hard work of the RethinkDB team, but for me, the TL;DR is NJ;DU (No Jepsen, Didn't Use)

[0] https://aphyr.com/tags/jepsen

[1] https://github.com/rethinkdb/rethinkdb/issues/1493

[+] coffeemug|11 years ago|reply

Slava @ Rethink here.

This is a great point, and we're on it! We have a Raft implementation that unfortunately didn't make it into 2.0 (these things require an enormous amount of patient testing). The implementation is designed explicitly to support robust automatic failover, no interruptions during resharding, and all the edge cases exposed in the Jepsen tests (and many issues that aren't).

This should be out in a few months as we finish testing and polish, and will include the results of the Jepsen tests. (It's kind of unfortunate this didn't make it into 2.0, but distributed systems demand conservative treatment).

[+] danielmewes|11 years ago|reply

Understood. We're planning to test with Jepsen soon. This will happen once we have implemented fully automatic failover (at the moment it still requires manual intervention, even though it's usually straight forward). We have a first working implementation, but are still working on the details. It should become ready in the next ~2 months.

See the issue you mentioned https://github.com/rethinkdb/rethinkdb/issues/1493 for progress on this.

[+] cdnsteve|11 years ago|reply

I'm going to give this a spin out of pure respect for the team that's dedicated 5 years to a product without cashing out. Hats off. Your CEO has some respectable... anatomy.

[+] geddski|11 years ago|reply

I've been using RethinkDB for a while now and I really enjoy working with it. It's a great fit for React and Angular 2 apps with their one-way data flow through the application. Hook up a store or a model to an event source (server-sent events) that streams the RethinkDB changes feed and it's just awesome and simple. Realtime shouldn't be this easy, totally feels like cheating. Love it.

I also really like the ability to do joins, where before in Mongo I would have to handle data joins in the app level.

[+] e12e|11 years ago|reply

How do you deal with user authentication, authorization and data encryption? Do you have a web server/application server or do you just combine static js/html/css resources and RethinkDB?

I'm kind of enamoured with the idea of couchapps -- but I'm still not entirely comfortable with having my db be my web and app server, as well as having it manage passwords etc... as I'm reading up, I'm slowly convincing myself it's possible to both make it work, be easy, support a sane level of TLS, load balance and be secure with proper ACL support... but very few tutorials/books seem to really deal with that to a level that brings me confidence.

[+] nileshtrivedi|11 years ago|reply

Do you have any project on github that works like that?

[+] evo_9|11 years ago|reply

Now if only Meteor would support this all would be good in the world.

[+] GordyMD|11 years ago|reply

+1

RethinkDB's realtime capabilities would fit perfectly with Meteor.

[+] Xorlev|11 years ago|reply

Congrats on the 2.0! It's been interesting to watch as a project.

Do you expect that as you stabilize you'll officially support more drivers? Or are you going to leave that as a community effort?

[+] coffeemug|11 years ago|reply

Slava @ Rethink here.

We're planning to take the most well-supported community drivers under the RethinkDB umbrella (assuming the authors agree, of course). It will almost certainly be a collaboration with the community, but we'll be contributing much more to the community drivers, supporting the authors, and offering commercial support for these drivers to our customers.

[+] mping|11 years ago|reply

Anyone has some numbers on performance? I tried RethinkDB 1.x and the performance wasn't quite there yet, specially bulk import and aggregations.

[+] coffeemug|11 years ago|reply

We'll be publishing a performance report soon (we didn't manage to get it out today).

Rough numbers you can expect for 1KB size documents, 25M document database: 40K reads/sec/server, 5K writes/sec/server, roughly linear scalability across nodes.

We should be able to get the report out in a couple of days.

[+] jfolkins|11 years ago|reply

I contributed the benchmarks to Dan's gorethink driver. Dan is great to collaborate with so if you want to hack on Go and contribute to OSS, consider giving his project a look.

One way to improve writes is to batch them, an example is here.

https://github.com/dancannon/gorethink/blob/master/benchmark...

I believe rethinkdb docs state that 200 is the optimum batch size.

Another way is to enable the soft durability mode.

http://rethinkdb.com/api/javascript/insert/

"In soft durability mode RethinkDB will acknowledge the write immediately after receiving and caching it, but before the write has been committed to disk."

https://github.com/dancannon/gorethink/blob/master/benchmark...

Obviously your business requirements come into play. I prefer the Hard writes because my data is important to me but I do insert debug messages using soft writes in one application I have.

*Edit: Heh I forgot to mention, on my Macbook Pro I was getting 20k w/s while batching and using soft writes.

Individual writes for me are hovering around 10k w/s on the 8 cpu 24gb instance i have. But yeah, define your business reqs then write your own benchmarks and see if the need is met.

Many devs write benchmarks in order to be the fastest and not the correctest. Super lame.

[+] mberning|11 years ago|reply

For the rubyists out there check out http://nobrainer.io/

[+] sandstrom|11 years ago|reply

Is anyone using nobrainer in production?

We're currently using Mongoid (MongoDB ORM), and an Active Record like ORM for RethinkDB is the main thing holdings us back.

I don't have great insight into nobrainer, but last I checked it seemed like joins wheren't implemented (but on the roadmap).

[+] expando|11 years ago|reply

Selling support is a great non-intrusive business model.

[+] ThinkBeat|11 years ago|reply

Except that it incentivises a company to build a product that requires continuing support.

That can be a good thing or a bad thing.

[+] xtrumanx|11 years ago|reply

Lots of congratulating on this thread and a hell of a lot of points for a software release. I've been on HN consistently for a long while and I didn't realize there was so much love and hype for RethinkDB here.

Have I missed something?

[+] andrewflnr|11 years ago|reply

I guess you have. There are a lot of us into alternative databases that are hoping for Rethink to fulfill the original promise of MongoDB. That said, I can't blame you for not devoting a bunch of attention to it. :)

[+] robertfw|11 years ago|reply

I've been following RethinkDB on HN for quite a while now and have been eagerly awaiting them to make a production-ready statement. Everything I have read has sounded very promising and I am excited to try it out!

[+] dkhenry|11 years ago|reply

Awesome news. I have used Rethink for a few internal projects and while I don't think it has that one "killer feature" that other DB's don't it is such a painless experience in development and deployment that makes just worlds better then trying to set up and scale some of the other solutions.

BZ rethinkdb team.

[+] kolencherry|11 years ago|reply

Congrats on the 2.0 release! Changefeeds are an incredibly powerful feature. We're looking forward to the next release with automagic failover!

[+] _dancannon|11 years ago|reply

Congratulations, been looking forward to this release for a while!

[+] straik|11 years ago|reply

I think this a good place to say thank you for you're work on the Go Rethink driver. This is a clear written easy to follow and effective peace of code.

[+] _dancannon|11 years ago|reply

Just released the latest update to the Go driver, it has some pretty big changes including the ability to connect to a RethinkDB cluster + automatic host discovery.

For more information check out https://github.com/dancannon/gorethink/releases/tag/v0.7.0.

[+] billclerico|11 years ago|reply

congrats Slava, Mike & team. in an age of thin apps getting shipped in weeks or months, the patience you showed in spending 5 years developing some pretty hard-core technology is amazing. really excited for you guys!

[+] gauravphoenix|11 years ago|reply

any plans of releasing officially supported Java driver? For most enterprise oriented apps, having officially supported Java driver will be great.

[+] coffeemug|11 years ago|reply

Yes! No ETA yet, but we're on it.

[+] dorfsmay|11 years ago|reply

Does RethinDB has a concept of transaction? My question is actually about restoring a lost node... If a node is rebooted, will all the data for its shards going to be sent again? Or just the delta?

Similarly if I have to rebuild a node from scratch, is there a way to prime it so that a massive copy of all the data in the cluster gets copied to it from the other nodes?

[+] coffeemug|11 years ago|reply

> If a node is rebooted, will all the data for its shards going to be sent again? Or just the delta?

Just the delta. We built an efficient, distributed BTree diff algorithm. When a node goes offline and comes back up, the cluster only sends a diff that the node missed.

> Similarly if I have to rebuild a node from scratch, is there a way to prime it so that a massive copy of all the data in the cluster gets copied to it from the other nodes?

You don't have to do that, it happens automatically. You can have full visibility and control into what's happening in the cluster -- check out http://rethinkdb.com/docs/system-tables/ for details on how this works.

[+] thoughtpolice|11 years ago|reply

I've updated NixOS to include 2.0.0-1: https://github.com/NixOS/nixpkgs/commit/fe6ec3d13a1554458e64... - any way we can get it mentioned on the website?

[+] coffeemug|11 years ago|reply

Could you suggest a pull request in docs? (https://github.com/rethinkdb/docs)

[+] cookiecat|11 years ago|reply

Congrats guys, RethinkDB has been a joy to use so far, but the 3rd party .net driver needs some help. I filed an issue here: https://github.com/rethinkdb/rethinkdb/issues/3931

[+] nickstinemates|11 years ago|reply

Big fan of RethinkDB. Use it in all of my projects these days.

[+] vonklaus|11 years ago|reply

What were you using before? What are the pros and cons of the switch?

[+] DAddYE|11 years ago|reply

I'm very happy to see this milestone, even tho I haven't used it recently I remember 2/3 years ago we tried it (adtech) for some heavy production workload. Even if we chosen another product (cassandra) I was literally surprised how well performed! Congrats!

152 comments