jackvanlightly's comments

jackvanlightly | 4 months ago | on: Kafka is Fast – I'll use Postgres

> A 500 KB/s workload should not use Kafka

This is a simplistic take. Kafka isn't just about scale, it, like other messaging systems provide queue/streaming semantics for applications. Sure you can roll your own queue on a database for small use cases, but it adds complexity to the lives of developers. You can offload the burden of running Kafka by choosing a Kafka-as-a-service vendor, but you can't offload the additional work of the developer that comes from using a database as a queue.

jackvanlightly | 4 months ago | on: Kafka is Fast – I'll use Postgres

We fixed that particular issue: https://jack-vanlightly.com/blog/2023/8/17/kafka-kip-966-fix...

jackvanlightly | 5 years ago | on: Pulsar – an open-source distributed pub-sub messaging platform

Not "currently" a distributed log, but we're working on adding that (in a Rabbity way). We're not trying to compete with the likes of Pulsar or Kafka though, we're just trying to round out RabbitMQ's functionality to ensure it remains the best swiss army knife of messaging - and log semantics (and performance) is now a dominant paradigm.

jackvanlightly | 5 years ago | on: Pulsar – an open-source distributed pub-sub messaging platform

If consumers are keeping up, there will be no reads to the BookKeeper layer as the Pulsar broker will serve from memory.

When reads need to go to BookKeeper there are caches there too, with read-aheads to populate the cache to avoid going back to disk regularly.

Even when having to go to disk, there are further optimizations in how data is laid out on disk to ensure as much sequential reading as possible.

Also note that the fragments aren't necessarily that small either.

jackvanlightly | 5 years ago | on: Pulsar – an open-source distributed pub-sub messaging platform

Yes it does FIFO.

Just like RabbitMQ, Apache Kafka and many other distributed systems, writes go through an elected leader, who is able to ensure ordering guarantees.

Specifically with Apache Pulsar, each topic has an owner broker (leader) who accepts writes and serves readers.

It should be noted that Apache Pulsar supports shared subscriptions which allow two or more consumers to compete for the same messages, like having two consumers on a RabbitMQ queue. Here FIFO order cannot be guaranteed for all kinds of reasons.

jackvanlightly | 5 years ago | on: Queues are Databases (1995)

One of the issues with modelling queue semantics over a database is performance. All that locking, key lookups and mutating of B trees is expensive.

The latest generation of durable messaging systems that offer queue semantics do so by modelling those semantics over a distributed, replicated log, such as Apache Pulsar and RabbitMQ's new replicated queue type called Quorum Queues.

A queue is different to a log in that reading from a queue is destructive, but reading from a log is not. So if I have two applications (shipping and auditing) that want a queue with all the shipping orders in, then each needs their own queue - so they don't compete over the messages. Whereas a log can be read by both, but both need to track their independent position in the log.

Apache Pulsar offers queue semantics to shipping and auditing by storing the shipping orders in one distributed log (a topic) and creating two separate subscriptions (also logs) that track the position (like Kafka consumer offsets). The destructive read of a queue is simulated by advancing the cursor (offset) of the subscription. The performance improvement this append-only log data structure offers compared to a mutable B-tree of the RBDMS is massive.

Quorum queues do it a different way, but still modelling queue semantics over a log.

Of course some future RDBMS storage backend wouldn't have to use B-trees and read_past locking etc, it could also use a log based data structure for message storage too.

jackvanlightly | 5 years ago | on: An introduction to RabbitMQ

Rabbit dev here. We released quorum queues a few months ago. It's a Raft based replicated queue that addresses all the old problems. https://www.rabbitmq.com/blog/2020/04/20/rabbitmq-gets-an-ha...