top | item 8032135

(no title)

hueyp | 11 years ago

This paper describes ideas around #2: http://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf

But yes, your database stores the status of a message. At this point you could drop every single message on your queue and have enough information to resend them. Each message could have its own resend (SLA) semantics.

Amazon Simple Workflows is an implementation of this pattern: http://aws.amazon.com/swf/. I've never used SWF but the docs are great food for thought.

[edit]

Also might be of interest to watch Rich Hickey's 'Language of the System'. (https://www.youtube.com/watch?v=ROor6_NGIWU&feature=kp -- there are a few versions of this talk, not sure if this is the exact viewing I saw). The talk is not really about queues, but he tries to break things up a bit. You need a data store, you need a queue, etc. As soon as a queue tries to have durable messages it is becoming a database and has all of the problems a database has to deal with. Instead you could keep data storage being solved by the data storage provider and let queues focus on passing messages.

This raises the issue of how to deal with dropped messages, but that can be solved without durable queues (like that paper describes / SWF / etc).

discuss

order

ryanjshaw|11 years ago

Thanks for those links. There seem to be some good concepts in there for formulating a solution to my present personal challenge: integrating multiple disparate sources of real-time events, some transient and being delivered with low latency, and some persistent but retrieved with high latency (up to 30min!), which needs to be analyzed (and potentially replayed and reanalyzed), producing a best-effort real-time feature stream while populating (and repopulating) a reliable data warehouse. It's taking me a long time to break the problem down to the right level of components.

> As soon as a queue tries to have durable messages it is becoming a database and has all of the problems a database has to deal with. Instead you could keep data storage being solved by the data storage provider and let queues focus on passing messages.

Yes; mind you, that doesn't exclude the queue from having a persistent backing store (to reduce the instances where your application has to be involved in replay), it just means applications shouldn't use queues as the golden source of events.