(no title)
mau | 3 years ago
Besides this, operating Kafka never required much effort a part when we needed to re-balance partitions across brokers. Earlier versions of Kafka required to handle it with some external tools to avoid network congestions, but I think this is part of the past now.
On the other hand, Kafka still needs to be used carefully, especially you need to plan topics/partitions/replications/retention but that really depends by the application needs.
Hackbraten|3 years ago
The crash happens 10–15 minutes into the downtime window of the first broker. Absolutely no one has been able to figure out why, or even to reproduce the issue.
Running out of things to try, we resorted to randomly changing all sorts of different combinations of consumer group timeouts, which are imho poorly documented so no one really understands which means which anyway. Of course all that tweaking didn’t help either (gunshot debugging never does).
This has been going on for the last two years. As far as I know, the issue still persists. Everyone in that project is dreading Amazon’s monthly patch event.
sumtechguy|3 years ago