Threads (in Ruby): Enough Already

[+] tptacek|15 years ago|reply

This post is a useful rundown on where threads stand in Ruby now, but it hangs on a bit of a straw man argument. The argument for evented concurrency assumes threads work. Async design isn't a reaction to MRI's crappy green threads.

It's also not a very compelling argument for threading to say "web programs don't share a lot of state, so you don't have to worry about synchronization". If all you do are CRUD apps, you can indeed punt scaling to the database. That doesn't mean threads are more effective than events; it means you made concurrency and synchronization someone else's problem. There's nothing wrong with that, but it's not a convincing demonstration of threading.

[+] wycats|15 years ago|reply

As I tried to make clear throughout the post, I'm not really making an argument for a huge amount of threads, or programming that involves a lot of exposed shared state.

I'm making an argument about how threads are used (in real life) in web development, an area where it's trivial to make concurrency and synchronization someone else's problem. Despite this, I've heard a number of hypesters throw around the idea that this scenario is an example of whether threading fails and moving to async is required.

I agree with you that this is a weak argument, and I hope to see people understand better the difference between:

a) an application that NEEDS to handle huge amounts of concurrent users (because most of them are idle for most of their lives), and

b) an application that spends a non-trivial amount of time using the CPU, and therefore does not need more than a few threads to fully utilize the CPU

There are different cases, and while those of us with a good grasp of the subject understand the difference, a lot of people have conflated the two ideas, and then further conflated the problems of thread synchronization in these cases as well.

[+] Nwallins|15 years ago|reply

> The argument for evented concurrency assumes threads work. Async design isn't a reaction to MRI's crappy green threads.

Truly? To my understanding, event loops require inversion of control (and likely callbacks and broken exception handling). This is a large cost that requires a benefit to be worth it. I understand that benefit to be: you don't have to deal with threads (or bad implementations of such).

This blog post comes to mind: http://www.unlimitednovelty.com/2010/08/multithreaded-rails-...

[+] sbov|15 years ago|reply

> It's also not a very compelling argument for threading to say "web programs don't share a lot of state, so you don't have to worry about synchronization".

Isn't this an issue with both models? Shared state is shared state, regardless of whether you use threads or an evented model. Unless you're only running on 1 CPU.

[+] ericb|15 years ago|reply

Ok, dumb question--I'm a bit naive on process forking. On Linux with Ruby, if I fork a process, must the new process stay on the same core? Forked processes begin by sharing memory initially in Ruby, correct? Can anyone enlighten me on the mechanics of forking and if there are scenarios where it helps with utilizing multiple cores (maybe relatively independent workers that just read shared settings?).

Forking seems...a little weird to me coming from my previous Windows background.

[+] wycats|15 years ago|reply

Forking creates a new process, initially "sharing" memory via copy-on-write semantics (and therefore being eligible to run on another core). However, C Ruby's default mark-sweep garbage collector touches pretty much all memory immediately, eliminating the typical memory savings of a forking model.

As I commented earlier, Ruby Enterprise Edition (made by the same guys as Passenger) ships an alternate GC that doesn't actually write to the memory space of the original objects in order to mark them.

As a result, some memory (but not necessarily as much as you'd expect) can be shared between processes forked from the same parent and running on multiple cores. Sort of a middle ground between totally separate processes and shared memory via non-GIL'ed threads.

[+] evanphx|15 years ago|reply

The linux kernel will schedule the new process on whatever core it wants.

Forked process in ruby have posix semantics, ie, NOT shared memory. Linux uses Copy-On-Write pages to conserve the amount of memory copying when the new process is created.

[+] amock|15 years ago|reply

The process will be scheduled by the OS just like any other process. I think fork() is similar to CreateProcess() on Windows, but I haven't done any Windows programming that needed to create more processes.

[+] samstokes|15 years ago|reply

This article is a good wake-up call, but even if Rails itself is now (since 2.2) thread-safe, I need to be fairly sure that all my libraries are before I can enable threaded request dispatch in my Rails app. That's a lot of code inspection / thorough regression testing / inevitable bugs.

The Brightbox team tried to tackle the related problem for moving to Ruby 1.9 with a crowdsourced library review site: http://isitruby19.com/ Does anyone know of a similar site where people can report whether libraries are thread safe? (The code for Is It Ruby 1.9 is available on github - http://github.com/brightbox/isitruby19 - so it would be easy enough to set up a clone, but I can't commit to maintaining such a thing.)

[+] wycats|15 years ago|reply

The Rails Plugins site (http://railsplugins.org) that we built at Engine Yard has 406 registered plugins, and a criteria for thread safety (http://railsplugins.org/plugins?criteria[]=2).

The way it works is that the author can specify whether or not he thinks it's threadsafe (yes/no/maybe), which is then verified by users who can specify whether they agree. If a user marks a plugin as not-threadsafe (or not-Rails 3, or not-JRuby, or not-Ruby 1.9), the author has 7 days to help the user come around before it sticks.

So far, there are 60 plugins marked as threadsafe (which means that either the author said "yes" and nobody disagreed, or the author said "maybe" and all the votes so far say yes).

[+] Tichy|15 years ago|reply

Somehow the takeaway from all these posts seems to be to stick to JRuby. Am I reading that right?

[+] wycats|15 years ago|reply

If you use C Ruby, you'll need one Ruby process per core, managed by something like Passenger (mod_rails) or Unicorn.

If you use JRuby, you'll need one Ruby process per machine (for N cores), managed by the JVM.

For boxes with a lot of cores, JRuby's larger memory footprint is overtaken by the ability to share that memory across a number of cores.

This story is also somewhat complicated by Ruby Enterprise Edition, which adds copy-on-write semantics to Ruby's GC, and is built by the same guys as Passenger, making it possible to share SOME memory between processes.

With all that said, we're really talking about marginal amounts of RAM. The real takeaway is that if you're running 6 processes per core (very common), you're doing something very wrong.

FYI: At some point in the future (1.2?), Rubinius will also be able to run a single Ruby process per machine.

[+] unknown|15 years ago|reply

[deleted]

[+] unknown|15 years ago|reply

[deleted]

36 comments