top | item 12456437

(no title)

I'm one of the Kafka authors, so admittedly my view might be slightly biased

Here is a quick comparison of Kafka and Pulsar:

- Kafka is a complete streaming platform vs a messaging system which is what Pulsar is. Through Kafka Connect (http://www.confluent.io/blog/announcing-kafka-connect-buildi...), it has support for connectors to stream data between various sources and systems. Through Kafka Streams (http://www.confluent.io/blog/introducing-kafka-streams-strea...), it has support to do stream processing and transformations over Kafka topics.

- Broad adoption base: Kafka is very widely adopted across thousands of companies worldwide. https://cwiki.apache.org/confluence/display/KAFKA/Powered+By

- Tunable durability and consistency knobs on the producer: The Kafka producer API allows the application to either wait until a message is fully committed across all replicas or just the leader. This allows applications to make the right tradeoffs for throughput vs durability. One size does not fit all.

- Performance and efficiency: Kafka supports zero-copy consumption allowing the consumers to read large amounts of data at high throughput. To the extent that I understand, Pulsar with its legder-broker model does not support zero-copy consumption.

- A lot of the reasons quoted for creating Pulsar are features that exist in Kafka and are used in production:

-- Kafka has multi-tenancy support through user-defined quotas (See this http://www.confluent.io/blog/sharing-is-caring-multi-tenancy...)

-- Kafka has support for authentication, authorization, user-defined ACLs (See this http://www.confluent.io/blog/apache-kafka-security-authoriza...)

-- Kafka has support for geo replication. In fact, that is the most common use case for Kafka in several companies. (See this https://engineering.linkedin.com/kafka/running-kafka-scale)

-- Latency: The end-to-end latency from publish to consume can be very low in Kafka (<10ms).

- Support for millions of topics: To the extent that I understand, both Pulsar and Kafka use ZooKeeper for metadata management. That is the main bottleneck for supporting a large number of topics and likely the same tradeoffs apply to both Kafka and Pulsar as a result.

- Storage model: The length of a partition in BookKeeper and hence in Pulsar is not bounded by the capacity of a server. So you have the ability to add servers to accommodate a workload spike.

This is merely a quick overview. There might be more aspects of this comparison that I'm missing.

discuss

BoorishBears|9 years ago

Why does this read less like a comparison and more like a "this shouldn't even exist they should have just went with our thing"-type pitch? (With proper consideration to the possibility of bias you mentioned)

itaifrenkel|9 years ago

Could you (Kafka/pulsar) devs add more details about low latency guarantees? With an in memory solution it is easier to talk about 99.9 or even more messages below 10ms latency. Also this requires a client side push protocol (like redis subscribe). Assuming there are no slow consumers involved is there a configuration that guarantees 10-20ms latency? At what percentile?

babo|9 years ago

How do you compare the geo-replication capabilities of Kafka vs. Pulsar?

mmerli|9 years ago

Added some info on geo-replication in Pulsar at https://github.com/yahoo/pulsar/blob/master/docs/GeoReplicat...