(no title)
trengrj | 5 years ago
I see the same with Spark vs Flink in that similarities outweigh differences. I wonder if this is some sort of emergent pattern in open source software.
trengrj | 5 years ago
I see the same with Spark vs Flink in that similarities outweigh differences. I wonder if this is some sort of emergent pattern in open source software.
majidazimi|5 years ago
1. A single partition is stored in one node (replicas on another nodes). With this, introducing new nodes takes very long time to replicate large partitions, because it can replicate one partition from only one node (leader of the partition). On Pulsar each segment of partition is stored in a different bookkeeper node.
2. Because of 1, if two consumers read different parts of a partition that are far from each other, they will compete over disk bandwidth. In Kafka consumer can not read from replica node. If a topic is really popular and many consumers try to read from it (from different parts of the file which makes OS page cache useless), total consumption rate is limited to disk bandwidth of a single node. But in Pulsar each consumer can read from different brokers. Catch up consumers won't trash streaming consumers in Pulsar.
These are not problems that can be fixed easily. Additionally, in the realm of streaming the difference between Flink and Spark is day and night. The low watermark feature that Flink offers makes them behave fundamentally different.
toomanybits|5 years ago
2. Kafka can read from a replica node. It's relatively new but it's there.
unknown|5 years ago
[deleted]
qaq|5 years ago
z9e|5 years ago
The only thing I can see that can make this true is Pulsar seems to have better elastic scalability. But it seems to score less on everything else. It has a much more complex storage system that ends up not matching Kafka's high-end throughput at large scale.
From what I recall, Twitter ended up abandoning BookKeeper due to storage scale concerns. Related: https://blog.twitter.com/engineering/en_us/topics/insights/2...
toomanybits|5 years ago
leafboi|5 years ago
Just to add to this, ease of use/setup is also a huge factor. There are technologies I can just spin up with zero knowledge and learn as I go. These are huge factors in adoption especially with Golang and nodejs.