top | item 3899080

Rails 4 will establish a new background job queueing API

172 points| jroes | 14 years ago |github.com | reply

57 comments

order
[+] nateberkopec|14 years ago|reply
Thanks for accurate title - this is not a full queueing system, but a unified API for hooking in bigger, badder queueing engines like Resque.

The point is to standardize the interface so other plugins/gems can simple make calls to Rails.queue rather than try to accomodate every queueing engine themselves.

[+] phillmv|14 years ago|reply
Someone please correct me if I am wrong.

Skimming through the code, this lets you register a Queue class to serialize your jobs. So, if you use something like Delayed Job, you register the (corresponding) DJ::Queue class that stores the jobs in whatever backend you desire and then process it later via your daemon of choice.

So far so peachy keen. This is alright, I can get behind this - it will make moving between queueing solutions more palatable which is not a feature I can complain about.

My question then is: how will this work by default? Will the default Queue have some sort of callback that executes after it returns the response? For stuff like sending emails, for small apps, this is actually palatable - I'm concerned about user latency than sheer requests/second.

[+] unknown|14 years ago|reply

[deleted]

[+] shill|14 years ago|reply
It would be awesome to have this in Django too.

Celery already does a great job, but it would be nice to have the batteries included.

[+] simonw|14 years ago|reply
I agree - I see this as similar to Django's pluggable caching backends.
[+] aaronjg|14 years ago|reply
I'd love to see these job queuing platforms have better support for high performance computing (HPC). Currently there are two paradigms of queuing systems. Things like PBS/Torque and Sun/Oracle/Univa Grid engine which work very well for small numbers of largish batch jobs, and things Delayed Job, Background Job and Resque which work well for huge numbers of small jobs.

When you start dealing with large jobs, system resources start to become an issue. A job might take 48GB of memory, or it might take 1GB of memory, and the scheduler needs to be aware of this so that it isn't scheduling jobs on top of each other. Or you might have some low priority jobs that should only be run when the queue is mostly full so as not to compete with the high priority jobs. Or you might have jobs that depend on other jobs, and you want to enqueue them all and let the scheduler handle the dependencies. HPC schedulers deal with these requirements well.

On the other hand, you might be in a situation where you have 10s of thousands of jobs in the queue, and you need to add and remove jobs quickly. Things like resque and delayed job handle these situations well.

HPC schedulers were built for research purposes, and background job schedulers were built for the web applications. However there are more and more companies dealing with large data problems that span both worlds. They have some large jobs and tons of small jobs, and they don't want to manage two separate clusters with two schedulers to handle the tasks.

[+] danneu|14 years ago|reply
I plucked the relevant points of discussion that reveal the thought process.

Q: "I've heard for years that pagination should remain outside rails since it has to be lightweight, and now that !?"

homakov: good example, but "pagination" is a design-related thing(like decal on a car) but "queue" or delayed jobs(jquery-deferred for example) is deep engine built in feature. As cars vendor You shouldn't choose decals for driver but you should install the best and reliable stuff under its hood IMO

...

Q: What's the point?

josevalim: The point of the Queue is to be small and provide an API that more robust engines like resque and sidekiq can hook in. So you can easily start with an in memory queue (as you can see, the implementation does not even reach 100LOC) which is also easy to test and then easily swap to another one. Why this is good? By having an unified API, tools like Devise, Action Mailer can simply use Rails.queue.push() instead of worrying with compatibility for different plugins. So the goal here is provide an API for queueing and with a simple in memory implementation. It is not meant to be a robust queue system.

...

Q: Why not make it into a gem?

josevalim: The implementation today is less than 100LOC, so there is no reason to move it to an external gem. If the implementation actually grows a lot, which I highly doubt, we can surely consider moving it to a gem.

...

Q: Why include it in Rails at all?

DHH: This is really very simple: Do most full-size Rails applications, think Basecamp or Github, need to use a queue? If the answer is yes, and of course it is, this belongs in Rails proper.

...

Q: Then, and I'm not just trolling, should Rails provide an API for user authentication or authorization?

DHH: authentication, pagination, etc are all application-level concerns -- not infrastructure. Think Person model vs ActiveRecord model. Another way to think of it is, would two applications have materially different opinions on queue.push depending on what they're doing? The answer is no. That is not the case for authentication, pagination, and other application-level concerns where the usage is often very different depending on what the application is trying to do.

...

Q: Is Rails getting too big?

DHH: The size of Rails itself is not a first-order metric of neither progress nor decline. The right question is: Does Rails solve more common problems than before without making the earlier solutions convoluted? In other words, what are the externalities of progress? Will introducing a queue API make it harder to render templates? Or route requests? No. It's most direct influence will be on things like ActionMailer, so a fair question will be: Is it harder or easier to use ActionMailer in a best-practice way after we get this? That's a fair question, but I'm absolutely confident that this will make using idiomatic AM usage (queuing mail delivery outside of the request cycle) much easier. Thus, progress.

[+] lkrubner|14 years ago|reply
I am curious, under what circumstances would one use this, rather than something like Rescue? And there is so much competition in this space, what exactly is the argument for having this as part of Rails?

Or, let me put my question a little differently. Github did an awesome job writing about their experiences, and the reasoning that lead them to create Resque. I'm wondering if anyone on the Rails team has posted an essay with as much background info as what Github did here:

https://github.com/blog/542-introducing-resque

But I'm also thinking about a conversation that happened here on Hacker News recently. 2 weeks ago: "Rails core killed ActiveResource"

http://news.ycombinator.com/item?id=3818223

and the original article touches upon the issue that I'd like to ask about here:

"It's not that I hate you or anything, but you didn't get much attention lately. There're so many alternatives out there, and I think people have made their choice to use them than you. I think it's time for you to have a big rest, peacefully in this Git repository."

Can't something similar be said about job queues? "There're so many alternatives out there, and I think people have made their choice to use them than you."?

So why create a new job queue system, and make it an official part of Rails? I am not sure I understand the intent.

[+] pdelgallego|14 years ago|reply
> I am curious, under what circumstances would one use this, rather than something like Rescue? And there is so much competition in this space, what

The goal is not to replace the existing queue solutions, but to create a common API, so the rest of the gems can can just treat all of them in a uniform way.

Quoting Jose Valim:

"The point of the Queue is to be small and provide an API that more robust engines like resque and sidekiq can hook in. So you can easily start with an in memory queue (as you can see, the implementation does not even reach 100LOC) which is also easy to test and then easily swap to another one.

Why this is good? By having an unified API, tools like Devise, Action Mailer can simply use Rails.queue.push() instead of worrying with compatibility for different plugins.

So the goal here is provide an API for queueing and with a simple in memory implementation. It is not meant to be a robust queue system. "

[+] simonw|14 years ago|reply
It looks like this is meant to be an interface with multiple backend implementations, so Resque would become one of the potential backends.

I see this as a similar thing to having an interface for caching which can then be backed by memcached, redis or the filesystem. It strikes me as an excellent idea - pretty much every web application should have an offline queue of some sort these days.

[+] jmcnevin|14 years ago|reply
I don't believe the intent here is to replace Resque (Resque is awesome), but provide a slim API at Rails.queue that Resque/Delayed Job/BackgroundDRB/Torquebox/etc. could tie into, similar to how Rails.cache works now, in addition to adding a simplistic default implementation.

Considering Rails has always been about best practices--and background job queueing is definitely a best practice--I think this is a great move.

This will also allow other gems/plugins to have an easy way to push their own jobs into the queue rather than trying to support a bunch of different queue implementations.

[+] route66|14 years ago|reply
As I understand the discussion (underneath the commit log, josevalim gives a comment), it's not about re-inventing a job queue but to offer an API for queues where you can hook in what you want. That way other services can use a queue (sending mails, processing frobnicates) through an advertised interface without having to rely on a specific implementation. You still can run resque behind it. (Caveat: I only read the discussion, this is not informed by interpreting the code)
[+] technoweenie|14 years ago|reply
GitHub doesn't actually use Resque directly (well, except some rare cases). defunkt built RockQueue to be our internal queue interface while he migrate the app from DelayedJob to Resque. This looks like the same concept.
[+] sheff|14 years ago|reply
Rails 4 looks like it will have some nifty features - anyone have any information on when the first Release Candidate will be ?
[+] xutopia|14 years ago|reply
I heard that it would be when it is ready :-P
[+] dancesdrunk|14 years ago|reply
A feature that may very well make me finally jump over to RoR. I've recently built quite a large site, and the only current bottle neck is when a few emails need to be sent off at the same time with attachments, and to be able to add that into a "que" and let the user continue browsing the site instead of stuck on a loading page (if only for a few seconds) would make the current set up ideal.

Incidentally - if any one has any way of doing this in PHP without having to setup cron jobs (and not using node or its derivatives), I'm really open to any ideas!

[+] gmack|14 years ago|reply
Although certainly not without its issues, the most popular solution for that platform is Gearman http://gearman.org . It's fairly ops-intensive, but the most friendly for PHP without having to resort to things like Stomp to interface with messaging (MQ) systems. Which are not optimally designed for job enqueing, per se.
[+] mibbitier|14 years ago|reply
Why wouldn't you want to use a DB queue or something, and have a separate cronjob / process the outgoing email?
[+] reitzensteinm|14 years ago|reply
I too rolled my own, and while trivial to create, it's always made me uneasy. If there's a bug, I won't know about it; Amazon SES will reject the emails if they're sent all at once, or perhaps the calls won't be made at all.

I ended up doing a little status page for my newsletter; I set it up to auto refresh in Opera, each one of of the refreshes sends 10 emails, and prints their statuses/destination/titles as they go (it's also rate limited in memcached). I chuck that the laptop or a third monitor and leave it for a couple of hours, keeping an eye on it as it goes.

Using something off the shelf I could trust would be much nicer.

[+] there|14 years ago|reply
Just move your code to a register_shutdown_function() call and it will execute after the output has been sent, but without having to deal with forking a background PHP process or running out-of-context.
[+] donw|14 years ago|reply
Rather than doing this using a queue in the webapp, why not let the on-host mailserver handle it for you?
[+] statictype|14 years ago|reply
I don't use Rails but I often look to it for good/simple design ideas. I'm interested in seeing how they implement simple, effective, reliable background queuing.
[+] edbloom|14 years ago|reply
interesting - not sure if it's really needed though - I've used Redis and Resque before and found it's performance was blisteringly fast. (Resque was made by Github https://github.com/blog/542-introducing-resque)
[+] bradly|14 years ago|reply
It isn't about speed or choice of queue, it's about a standerized API for working with queues so you can focus on developing your application domain. You will still be able to use Resque or Sidekiq or DJ or anything else, there will just be a standard API for all of them to use.
[+] ecoffey|14 years ago|reply
I think the point is to provide an abstraction layer, so that the community has some common feature set and protocol expectations when we're discussing different technical solutions to queuing.
[+] rjsamson|14 years ago|reply
I'm sure there will be plenty of folks raging against it, but I for one am glad to see the addition.
[+] thibaut_barrere|14 years ago|reply
Coupled with that, I would love to see Passenger support background workers with the same lifecycle as front-end workers (but last time I suggested that, it wasn't planned at all if I remember well).
[+] pkmiec|14 years ago|reply
We implemented something like this at the place I work.

We have a tiny ruby process, based on event machine, that subscribes to various queues (we happen to use RabbitMQ). When a message arrives, the process makes a request to the passenger instance passing along the message data and waits for a response. The process limits the number of requests it makes to prevent background requests from blocking out front-end requests (for example, 20% of passenger_max_pool_size). We're also simulating priority by using different prefetch values for different queues (for example, 10 messages for high queue and 5 messages for low queue).

[+] ecoffey|14 years ago|reply
That's awesome to see. This was part of tenderlove's keynote.
[+] dirkdk|14 years ago|reply
separating API from actual implementation is always a good thing.

Like in Java, JMS API has many implementations

[+] endlessvoid94|14 years ago|reply
This strikes me as something that should be decoupled from rails.
[+] j45|14 years ago|reply
Maintaining interoperability between plugins (gems etc) is a perpetual headache.

In some ways, I'm surprised this hasn't been there all along..

On the other hand I'm a little surprised something this simple is being celebrated as a big deal.

It's nice to see rails continue to evolve, time will tell how much it ends up looking compared to the over-arching frameworks it was out to under-do.

[+] awj|14 years ago|reply
...why? This isn't supposed to be the one true rails queueing implementation, just a standard interface to code against. Without it third party libraries have to resort to all sorts of ugly workarounds to push work into the background ... or just not provide that feature. With this in place third party tools can push into the queue without knowing or caring what specific kind of queue you're using.
[+] andyl|14 years ago|reply
Great news. A built-in background job queue should reduce the rails learning curve - simpler to use a default option than research and test the various custom options that are available now.