(no title)
nehanarkhede | 9 years ago
Here is a quick comparison of Kafka and Pulsar:
- Kafka is a complete streaming platform vs a messaging system which is what Pulsar is. Through Kafka Connect (http://www.confluent.io/blog/announcing-kafka-connect-buildi...), it has support for connectors to stream data between various sources and systems. Through Kafka Streams (http://www.confluent.io/blog/introducing-kafka-streams-strea...), it has support to do stream processing and transformations over Kafka topics.
- Broad adoption base: Kafka is very widely adopted across thousands of companies worldwide. https://cwiki.apache.org/confluence/display/KAFKA/Powered+By
- Tunable durability and consistency knobs on the producer: The Kafka producer API allows the application to either wait until a message is fully committed across all replicas or just the leader. This allows applications to make the right tradeoffs for throughput vs durability. One size does not fit all.
- Performance and efficiency: Kafka supports zero-copy consumption allowing the consumers to read large amounts of data at high throughput. To the extent that I understand, Pulsar with its legder-broker model does not support zero-copy consumption.
- A lot of the reasons quoted for creating Pulsar are features that exist in Kafka and are used in production:
-- Kafka has multi-tenancy support through user-defined quotas (See this http://www.confluent.io/blog/sharing-is-caring-multi-tenancy...)
-- Kafka has support for authentication, authorization, user-defined ACLs (See this http://www.confluent.io/blog/apache-kafka-security-authoriza...)
-- Kafka has support for geo replication. In fact, that is the most common use case for Kafka in several companies. (See this https://engineering.linkedin.com/kafka/running-kafka-scale)
-- Latency: The end-to-end latency from publish to consume can be very low in Kafka (<10ms).
- Support for millions of topics: To the extent that I understand, both Pulsar and Kafka use ZooKeeper for metadata management. That is the main bottleneck for supporting a large number of topics and likely the same tradeoffs apply to both Kafka and Pulsar as a result.
- Storage model: The length of a partition in BookKeeper and hence in Pulsar is not bounded by the capacity of a server. So you have the ability to add servers to accommodate a workload spike.
This is merely a quick overview. There might be more aspects of this comparison that I'm missing.
BoorishBears|9 years ago
itaifrenkel|9 years ago
babo|9 years ago
mmerli|9 years ago