jackvanlightly | 4 months ago | on: Kafka is Fast – I'll use Postgres
jackvanlightly's comments
jackvanlightly | 4 months ago | on: Kafka is Fast – I'll use Postgres
jackvanlightly | 5 years ago | on: Pulsar – an open-source distributed pub-sub messaging platform
jackvanlightly | 5 years ago | on: Pulsar – an open-source distributed pub-sub messaging platform
When reads need to go to BookKeeper there are caches there too, with read-aheads to populate the cache to avoid going back to disk regularly.
Even when having to go to disk, there are further optimizations in how data is laid out on disk to ensure as much sequential reading as possible.
Also note that the fragments aren't necessarily that small either.
jackvanlightly | 5 years ago | on: Pulsar – an open-source distributed pub-sub messaging platform
Just like RabbitMQ, Apache Kafka and many other distributed systems, writes go through an elected leader, who is able to ensure ordering guarantees.
Specifically with Apache Pulsar, each topic has an owner broker (leader) who accepts writes and serves readers.
It should be noted that Apache Pulsar supports shared subscriptions which allow two or more consumers to compete for the same messages, like having two consumers on a RabbitMQ queue. Here FIFO order cannot be guaranteed for all kinds of reasons.
jackvanlightly | 5 years ago | on: Queues are Databases (1995)
The latest generation of durable messaging systems that offer queue semantics do so by modelling those semantics over a distributed, replicated log, such as Apache Pulsar and RabbitMQ's new replicated queue type called Quorum Queues.
A queue is different to a log in that reading from a queue is destructive, but reading from a log is not. So if I have two applications (shipping and auditing) that want a queue with all the shipping orders in, then each needs their own queue - so they don't compete over the messages. Whereas a log can be read by both, but both need to track their independent position in the log.
Apache Pulsar offers queue semantics to shipping and auditing by storing the shipping orders in one distributed log (a topic) and creating two separate subscriptions (also logs) that track the position (like Kafka consumer offsets). The destructive read of a queue is simulated by advancing the cursor (offset) of the subscription. The performance improvement this append-only log data structure offers compared to a mutable B-tree of the RBDMS is massive.
Quorum queues do it a different way, but still modelling queue semantics over a log.
Of course some future RDBMS storage backend wouldn't have to use B-trees and read_past locking etc, it could also use a log based data structure for message storage too.
jackvanlightly | 5 years ago | on: An introduction to RabbitMQ
This is a simplistic take. Kafka isn't just about scale, it, like other messaging systems provide queue/streaming semantics for applications. Sure you can roll your own queue on a database for small use cases, but it adds complexity to the lives of developers. You can offload the burden of running Kafka by choosing a Kafka-as-a-service vendor, but you can't offload the additional work of the developer that comes from using a database as a queue.