I suppose it depends on your message volume. To me, processing 100k messages and then getting a page however long later as the broker (or whatever) falls apart sounds much worse than head of line blocking and seeing the problem directly in my consumer. If I need to not do head of line blocking, I can build whatever failsafe mechanisms I need for the problematic data and defer to some other queueing system (typically, just add an attempt counter and replay the message to the same kafka topic and then if attempts > X, send it off to wherever)I'd rather debug a worker problem than an infra scaling problem every day of the week and twice on Sundays.
singron|10 months ago
The parallel consumer nearly entirely solved this problem. Only the most egregious cases where keys were ~3000 times slower than other keys would cause an issue, and then you could solve it by disabling that key for a while.
Misdicorl|10 months ago
I tend to prefer other queueing mechanisms in those cases, although I still work hard to make 99ths and medians align as it can still cause issues (especially for monitoring)
Misdicorl|10 months ago