Adventures in message queues

[+] ChuckMcM|11 years ago|reply

In my experience there are three things that will break here;

1) At-most-once is a bridge to an elementary school which has an inter-dimensional connection to a universe filled with pit vipers. Kids will die, and there is nothing you can do to stop it.

2) Messages are removed when acknowledged or memory pressure forces them to be kicked out. Black Perl messages, those that sail in from out of nowhere, and lonely widows (processes that never find out their loved ones are dead) will result.

3) Messages are ordered using wall clock millisecond time. This will leave your messages struggling to find their place in line and messages that should be dead, not be dead (missing fragment problem).

Obviously all these are simply probabilistic trade-offs based on most likely scenarios which result in arbitrarily small windows of vulnerability. No window is small enough at scale over time.

Often when these things have bitten me it has been non-programming stuff. For example a clock that wouldn't follow NTP because it was too far ahead of what NTP thought the time was, an operator fixed that by turning time back 8 seconds. A client library that was told messages arrive at most one time, and so made a file deletion call on the arrival of a message, a restored node holding that message managed to shoot it out before the operator could tell it that it was coming back from a crash, poof damaged file. And one of my favorites in ordering, a system that rebooted after an initial crash (resetting its sequence count) and getting messages back into flight with the wrong sequence number but with legitimate sequence values. FWIW, these sorts of things are especially challenging for distributed storage systems because files are, at their most abstract, little finite state machines that walk through a very specific sequence of mutations the order of which is critical for correct operation.

My advice for folks building such systems are never depend on the 'time', always assume at-least-once, and build in-band error detection and correction to allow for computing the correct result from message stream 'n' where two or more invariants in your message protocol have been violated.

Good luck!

[+] antirez|11 years ago|reply

Hello, 1 / 2 are not going to be problems for sure because:

1) At most once must be clearly asked for with the right API call, the default is at least once, and the API has a protection to avoid an error by mistake, that is, "retry=0" is only accepted if you specify a single copy for the message, otherwise an error is returned:

> addjob myqueue myjob 0 retry 0

(error) ERR With RETRY set to 0 please explicitly set REPLICATE to 1 (at-most-once delivery)

2) No active messages are removed on memory pressues, only acknowledged messages (already processed at least one time) that are still not garbage collected since not every node that may have a copy was reached. This means that Disque by default try hard to avoid spurious multiple deliveries, but on memory pressure it only sticks the main guarantee (that is, at least once delivery).

3) Most problems involving a message broker are fine with "fair" ordering, that is, we are not going to deliver messages in random order. Even if there is some clock difference between nodes (for sure there is), in case multiple nodes are generating messages for the same queue, the queue of certain nodes with clocks more "in the past" may have under high traffic some millisecond of delay, because queues are modeled with skiplists, so messages are inserted in the right place when migrated from one node to another because of the auto-federation feature. When no message migration is involved, messages are delivered exactly in order, but the order is even in this case violated at some point because Disque re-deliver messages automatically if not acknowledged, which is a violation of ordering per se.

[+] politician|11 years ago|reply

I don't understand the purpose of at-most-once semantics in practice: So, you've got some process where you don't care if the message goes through, but you're willing spend money on the compute/storage for it anyway? Why bother?

If you're designing a system with those semantics, is that because you're hoping that it's exactly-once: 99.999% of the time -- wink, wink, nudge, nudge so that your handlers don't have to be idempotent? What's the fallback plan for the day that all of your messages go into a networking blackhole?

[+] ma2rten|11 years ago|reply

How does something like rabbitmq fare here?

[+] shin_lao|11 years ago|reply

It is possible to build reliable monotonic counters based on time, and that's actually a sound approach.

[+] antirez|11 years ago|reply

I'm very sorry, credits for the questions goes to Jacques Chester, see https://news.ycombinator.com/item?id=8709146 I made an error cut&pasting the wrong name of Adrian (Hi Adrian, sorry for misquoting you!). Never blog and go to bed I guess, your post may magically be top news on HN...

[+] pixelmonkey|11 years ago|reply

Seems like a similar design to Apache Kafka, http://kafka.apache.org. AP, partial ordering (Kafka does ordering within "partitions", but not topics).

One difference is that Disque "garbage collects" data once delivery semantics are achieved (client acks) whereas Kafka holds onto all messages within an SLA/TTL, allowing reprocessing. Disque tries to handle at-most-once in the server whereas Kafka leaves it to the client.

Will be good to have some fresh ideas in this space, I think. A Redis approach to message queues will be interesting because the speed and client library support is bound to be pretty good.

[+] andrea_s|11 years ago|reply

Maybe I'm missing something, but if it is important to guarantee that a certain message will be dispatched and processed by a worker, why wouldn't a RDBMS with appropriate transactional logic be the best solution?

[+] Spearchucker|11 years ago|reply

It would. There's an argument that something like only once, guaranteed and in-order delivery are business requirements which therefore have no place in the message layer - http://www.infoq.com/articles/no-reliable-messaging.

[+] acolyer|11 years ago|reply

Credit for the questions is due to jacques_chester, not me! See https://news.ycombinator.com/item?id=8709146

[+] turingbook|11 years ago|reply

>a few months ago I saw a comment in Hacker News, written by Adrian Colyer...was commenting how different messaging systems have very different set of features, properties, and without the details it is almost impossible to evaluate the different choices, and to evaluate if one is faster than the other because it has a better implementation, or simple offers a lot less guarantees. So he wrote a set of questions one should ask when evaluating a messaging system.

I can not find the comment by @acolyer on HN. Who can help me?

[+] discardorama|11 years ago|reply

I think Salvatore mis-remembered. The comment was by @jacques_chester (he made a joke about it below, but people didn't get it). The comment is in this thread (HN doesn't let me link directly): https://news.ycombinator.com/item?id=8708921 . Look for Jacque's comment in there, and Salvatore replied to it.

I also wanted to find the original, and used Jacque's remark below as a starting point to start hunting for it.

[+] caf|11 years ago|reply

I wonder what the point is in having "best effort FIFO"? If the application has to be able to deal with unordered messages anyway, you might as well not bother to try to maintain any kind of order.

It's as well to be hung for a sheep as for a lamb.

[+] ekimekim|11 years ago|reply

In general, you want to consume messages "fairly" ie. in a way that minimizes latency introduced by the queuing. Best-effort ordering gives you this most of the time, which is better than none of the time.

[+] latch|11 years ago|reply

It lets the consumer use a fixed buffer to re-introduce ordering (possibly dropping any out of order messages that come outside of a window)

[+] mappu|11 years ago|reply

Ask HN: I'm in the market for a distributed message queue, for scheduling background tasks -

Does anything support "regional" priorities, where jobs are popped based on a combination of age + geographic/latency factors?

Also, what are recommended solutions for distributing job injection? My job injection is basically solely dependent on time, and so i envisage one node (raft consensus?) determining all jobs to inject into the queue.

My queue volume is about 50 items/sec and nodes will be up to 400ms apart.

[+] Rapzid|11 years ago|reply

That sounds like a job for a scheduler, in the cluster sense. I'm actually looking around for a job framework in .NET that supports custom schedulers, but have yet to find something that supports resource constrained scheduling. It's all either about en-queueing something to be done NOW, or at a future date. I haven't seen anything that supports custom scheduler implementations on a per-job type basis. They don't really distinguish between logging work to be done and deciding whether or not it can be executed NOW.

[+] isb|11 years ago|reply

This looks very cool. At-least once semantics are the way to go because most tasks require idempotence anyway and that helps in dealing with multiple delivery. Strict FIFO ordering is not always needed either as long as you avoid starvation - most of the time you need "reliable" deferred execution ("durable threads").

I started prototyping something along these lines on top of riak (it is incomplete - missing leases etc but that should be straightforward to add): https://github.com/isbo/carousel It is a durable and loosely FIFO queue. It is AP because of Riak+CRDTs. It is a proof of concept - would be nice to build it on top of riak_core instead of as a client library.

[+] jtchang|11 years ago|reply

When I first installed Redis years ago I was astounded at how easy it was to get up and running. Compare this to the plethora of message brokers out there: the vast majority you will spend the better half of the day trying to figure out how to configure the damn thing.

My overall impressions with message brokers is that RabbitMQ is a pain in the ass to setup, celery is my go to these days with beanstalkd being a close second if I don't want too many of celery's features.

[+] lobster_johnson|11 years ago|reply

RabbitMQ is ridiculously trivial to set up in a clustered configuration. Even more advanced things like HA policies and user permissions are super easy to do.

RabbitMQ's problem is that has no tolerance for network partitions, and also tends to be very buggy in the presence of such.

[+] arturhoo|11 years ago|reply

Agreed, I've used beanstalkd in multiple projects/companies and its simplicity, robustness and especially stability has always impressed me.

There are client drivers for a wide range of languages including Python and Ruby (and also a ActiveJob compatible library).

Installing it is simple as downloading the source, running make and then the binary. http://kr.github.io/beanstalkd/

[+] yen223|11 years ago|reply

Celery isn't a message broker - they always recommend using Celery as a consumer for a RabbitMQ broker.

[+] Gigablah|11 years ago|reply

Is that so? RabbitMQ on a single node is easy, you can get up and running right after installation with just the defaults.

I do agree there's a few gotchas with clustering though, such as firewall rules and inet_dist_listen_min/max, setcookie, etc.

[+] cdelsolar|11 years ago|reply

how to set up rabbitmq:

1) sudo apt-get install rabbitmq 2) there is no step 2

[+] sylvinus|11 years ago|reply

FYI, Salvatore will speak at dotScale in Paris about Disque on June 8: http://dotscale.io

[+] rdoherty|11 years ago|reply

This has me excited for many reasons. Redis is amazingly powerful, robust and reliable piece of technology. Also I love reading antirez's blog posts about the decisions behind Redis so I can't wait to learn more about queueing systems from him when discussing Disque.

[+] unknown|11 years ago|reply

[deleted]

[+] bcg1|11 years ago|reply

This looks like a good effort, congratulations.

Personally I'm torn on the usefulness of generic brokers for all circumstances... there are obvious advantages, but at the same time every messaging problem scales and evolves differently so a broker can quickly become just one more tail trying to wag the dog.

I am also interested in the architecture of tools like ZeroMQ and nanomsg, where they provide messaging "primitives" and patterns that can easily be used to compose larger systems, including having central brokers if that floats your boat.

[+] jraedisch|11 years ago|reply

We recently switched from RabbitMQ to Redis queuing because we were not able to implement a well enough priority queue with highly irregular workloads. Prefetch would not work since 2 minute workloads would block all following messages. Timeout queues would somewhat rebalance msgs, but large blocks of messages would be queued at the same time and therefor be processed as large blocks. Now our workers are listening to 10 queues/lists with different priorities with BRPOP and so far everything seems to work.

[+] latch|11 years ago|reply

Unodered, in-memory queues shouldn't be anyone's goto solution. I think there's a time and place for these, and having at-least-once delivery is a huge win over just using Redis, so I'm excited.

Still, unless you know exactly what you're doing, you should pick a something with strong ordering guarantees and that won't reject messages under memory pressure (although, rejecting new messages under memory pressure is A LOT easier/better to handle than dropping old messages).

[+] jpfr|11 years ago|reply

Some big project are currently making the switch to DDS-based pub/sub. [1,2]

Now that everybody is making QoS guarantees in pub/sub and message queues, is there a real difference to the 10 year old tech deployed in boats, trains and tanks?

[1] http://www.omg.org/spec/DDS/1.2/

[2] http://design.ros2.org/articles/ros_on_dds.html

[+] kweinber|11 years ago|reply

Anyone personally work on a DDS project is in actual production? I have never even seen one although I've worked in some of the industries where it is supposedly a success.

[+] arunoda|11 years ago|reply

I think this has a lot of roots from NSQ. But, NSQ has no replication support.

I think built in replication is very nice to have. Would like to try once this arrives.

[+] politician|11 years ago|reply

NSQ writes messages sent to a topic onto all channels subscribing to the topic which can be used as a form of replication in a way that meshes well with its at-least-once semantics.

[+] jacques_chester|11 years ago|reply

I wish to assure all and sundry that Adrian Colyer is not my secret crime-fighting identity, and vice versa :)

[+] antirez|11 years ago|reply

Sorry Jacques :-( Fixed... and thank you again for your comment.

[+] turingbook|11 years ago|reply

Thanks for your insightful comment at: https://news.ycombinator.com/item?id=8709146

[+] unknown|11 years ago|reply

[deleted]

[+] Lx1oG-AWb6h_ZG0|11 years ago|reply

Will there be any way to set up machine affinity? I think Azure Service Bus uses this mechanism (by specifying a partition key for a message) to enable strict FIFO for a given partition.

99 comments