top | item 23821776

(no title)

miguno | 5 years ago

See my previous answer further up in this sub-thread. Neither Kafka nor BookKeeper require data repartitioning when adding new nodes. Instead, both require data rebalancing, which moves some data from existing nodes to the newly added nodes.

Think: repartitioning changes the logical layout of the data, which can impact app semantics depending on your application; whereas data (re)balancing just shuffles around stored bytes "as is" behind the scenes, without changing the data itself. The confusion stems probably from the two words sounding very similar.

For Kafka, you use tools like Confluent's Auto Data Balancer (https://docs.confluent.io/current/kafka/rebalancer/index.htm...) or LinkedIn's Cruise Control (https://github.com/linkedin/cruise-control) that automatically rebalance the data in your Kafka cluster in the background. Pulsar has its own toolset to achieve the same.

discuss

No comments yet.