MQTT vs. Kafka: An IoT Advocate's Perspective

[+] Jemaclus|3 years ago|reply

This article appears to be comparing MQTT and Kafka + Schema Registry. Using Schema Registry is not required to use Kafka, so OP overcomplicated their own set up for this comparison. There's no argument that Schema Registry is valuable, but it's not something that MQTT seems to provide out of the box, so the comparison seems flawed.

I'd be interested in a comparison that is actually apples-to-apples instead of introducing complexity with Schema Registry.

[+] adev_|3 years ago|reply

The article is pretty biased by comparing the complexity a schema free scenario (MQTT) to Kafka with Schema.

However his points still remains: Most of the usage of Kafka I have seen in production are the result of a random Architect/Techlead who tried follow the hype train on event sourcing and a recipe for disaster.

And in 90% of the case, that could have been replaced by a trivial lightweight mosquito (MQTT) server for 10% of the operating cost.

Kafka is a monster of complexity notoriously hard to operate (Hello ZooKeeper) and to understand properly (Hello ordering, persistency and partitions).

If all you need is a simple stupid publish/subscribe broker with topics/auth management, do a favour to yourself, stay away from it.

[+] MuffinFlavored|3 years ago|reply

> However his points still remains: Most of the usage of Kafka I have seen in production are the result of a random Architect/Techlead who tried follow the hype train on event sourcing and a recipe for disaster.

While calling this out on a message board comment section is going to be well-received, asking "do we need this" while working at the company with said architect/tech lead is not well-received.

How many of us get paid to work jobs where we're basically told "shut up, this is what we're doing/using, go with it"?

[+] victor106|3 years ago|reply

>Kafka is a monster of complexity notoriously hard to operate (Hello ZooKeeper) and to understand properly (Hello ordering, persistency and partitions).

100% this.

Even using managed Kafka is a pain for most use cases. We replaced managed Kafka with a simple postgresql db using skip locked as a queue mechanism and the dev teams productivity tripled and our total cost of ownership decreased dramatically.

Don’t think twice, think 10 times if you really need Kafka

[+] lmm|3 years ago|reply

Kafka no longer requires zookeeper. If you need true master-master high availability from a datastore - which anyone who bothers with a load balancer for their application should demand, what's the point in running your application in a HA configuration if your datastore is a single point of failure - then to the best of my knowledge Kafka is still the least bad option available. It's not the easiest thing to operate, but I'll take it over Galera or Greenplum any day.

[+] NovemberWhiskey|3 years ago|reply

I see Kafka deployed for things which have perhaps a few thousand messages per day. It's like "did you accidentally mis-specify by six orders of magnitude here?"

[+] gvtek0|3 years ago|reply

>However his points still remains: Most of the usage of Kafka I have seen in production are the result of a random Architect/Techlead who tried follow the hype train

Don't look now but this is how people end up with k8s as well. "We need Kubernetes because we need containers." Google et al convinced people it's the only way to run containers in prod.

[+] cduzz|3 years ago|reply

I really have to disagree -- for what I've used it for kafka's basically been reliable and amazingly simple to manage.

The use case is, I think in the grand scheme of things pretty simple -- it's an "on-prem" infrastructure with some mixture of old and new servers and mixture of SSD and terrible old 3.5in rotational media. None of it is "cloud" -- just kafka clusters deployed with puppet with fluent and kafka rest proxy feeding the kafka and logstash or vector reading from it, but ... it just works. We've had one incident in the past 4 years, and that was because the network decided to go super asymmetric.

Anyhow, I've got lots of problems, but "running kafka" isn't one of them.

[+] petre|3 years ago|reply

> And in 90% of the case, that could have been replaced by a trivial lightweight mosquito (MQTT) server for 10% of the operating cost.

What about ZeroMQ and if one also needs to temporarily store the queued data at least until it's delivered?

We use MQTT now, but with EMQX as a broker instead of Mosquitto. It has a HTTP API for managing users and ACLs which was easier to integrate than the equivalent Mosquitto MQTT API.

[+] foolfoolz|3 years ago|reply

> implying the operational costs of a server are captured in its per hour sticker price

managed kafka has been around a while

[+] rockwotj|3 years ago|reply

Check out http://redpanda.com for a simpler to manage, and 10x faster Kafka

[+] septune|3 years ago|reply

Forget MQTT, Redis as a PUB/SUB will do 99% of the job most of the time.

[+] twawaaay|3 years ago|reply

One missing criterion is client complexity. MQTT is built to work well with very little resources on the client. Kafka, on the other hand, requires you to do things you just don't want on a small embedded device -- like opening multiple connections to multiple hosts. Kafka is also just a transport for messages while MQTT is much larger part of the stack and takes care of transporting individual values. Which means you need less other code on your super restricted device.

That said, I don't understand all the complaining directed at Kafka in this thread. Kafka is a fantastic tool that provides unique properties and guarantees. As a tech lead/architect I love to have a good selection of tools for different situations. Kafka is very reliable tool that fils an important role of when creating distributed systems and is particularly nice because it is easy to reason about. The negative opinions I heard in the past are typically from people who try to use it for something that it is not well suited for (like efficient transfer of large volumes of data) or because they misunderstood how to use its guarantees to construct larger systems.

At one place I met a team who was completely lost with their overloaded Kafka instance and requested to get external help to "further scale and tune" it.

I just touched the piece of code on producer and on consumer to publish data in large files to S3 rather than push it all through Kafka. Instead, send a simple message to Kafka with the metadata and location of the payload in S3. And then the client to download it from the bucket. They were happy puppies in no time.

[+] justinclift|3 years ago|reply

An important question not mentioned in this article - and may not have been known by the author - is how much (Dev)Ops burden do each of these add?

In the places I've worked that use Kafka, it's 100% always a source of issues and operational headaches.

That's in fairly high throughput environments though, no idea if it "just works" flawlessly in easy going ones.

[+] sigwinch28|3 years ago|reply

I wonder... how many issues was Kafka "soaking up" by dealing with concerns that applications and services didn't have to even consider?

As in, I wonder how much application developer burden would be present if using MQTT instead.

[+] ryanjshaw|3 years ago|reply

What issues did you run into?

From a technology perspective it's been rock solid for years in my experience.

Where issues crept in it was always due to people not understanding the architecture and patterns you need to use e.g. anti-patterns like splitting batches into multiple messages, "everything must be stored in Kafka" thinking, not understanding how offset commits work, not understanding when to use keys or the effects of partitioning, resetting offsets on a live topic, aggressive retention policies etc.

[+] Scubabear68|3 years ago|reply

For shops light on DevOps-fu, Confluent hosted Kafka is popular for just this reason.

[+] FridgeSeal|3 years ago|reply

If you’re on AWS I’ve had zero issues with their managed Kafka offering (MSK). I’m sure they did lots behind the scenes, but it was really one of our most rock-solid pieces of infrastructure.

If I had a need for Kafka in my current role, I’d probably give Confluent and Red Panda offerings a shot.

[+] outworlder|3 years ago|reply

> In the places I've worked that use Kafka, it's 100% always a source of issues and operational headaches.

Compared to what?

I have the opposite experience. For example, ingesting large amounts of log data. Kafka could handle an order of magnitude more events compared to Elasticsearch. Even if the data ultimately ended up in ES, being able to ingest with Kafka improved things considerably. We ended up getting an out of the box solution that does just that (Humio, now known as LogScale).

Similar experience when replacing RabbitMQ with Kafka. None "just works" and there's always growing pains in high throughput applications, but that comes with the territory.

Is Kafka the source of headaches, or is it Zookeeper? Usually it's Zookeeper for me (although, again, Zookeeper has difficult problems to solve, which is why software packages use ZK in the first place).

[+] anonymousDan|3 years ago|reply

To be fair, it's not like it's solving a trivial problem. High throughput, reliable and highly available message queuing is just hard.

[+] drowsspa|3 years ago|reply

Where I work we have an on-premises Hadoop cluster and Kafka is its only stable component that works without constant headaches.

[+] speedgoose|3 years ago|reply

I hear from everyone using Kafka in production that it is hell unless you use Confluent.

I gave a try to NATS JetStreams but I havn't been convinced by the performances of the Python client, nor the JavaScript one. I don't have extreme data, I just need descent performances.

I'm thinking about giving a try to RabbitMQ streams. I have been very happy with RabbitMQ, the MQTT plugin isn't fully working (the big one is that retained messages are not sent to wildcard subscribers), but it should work with AMQP.

[+] EdwardDiego|3 years ago|reply

Welp, here's a dissenting opinion - it's not.

I've run self-managed, sorta managed (MSK), fully managed (Confluent Cloud), and somewhat managed (Strimzi).

It is complex, yes, but it solves a very complicated problem. The issue tends to arise when people use it when simpler alternatives exist for their problem.

[+] grepLeigh|3 years ago|reply

I've had an excellent experience using the Rust NATS client!

I pump time series data through NATS running on a Raspberry Pi, which is part of a 3D printer monitoring and event/automation system. I also use NATS as an MQTT broker, for compatibility with other software in the 3D printer ecosystem.

FWIW I also have lots of experience running large Kafka and Rabbitmq fleets. The choice between these technologies depends on what you're optimizing for.

[+] outworlder|3 years ago|reply

> I'm thinking about giving a try to RabbitMQ

We went the opposite route. Kafka has been much better. Up to a certain volume, both solved the problem. When RabbitMQ required too much tuning, a decision was made to go to Kafka, and it's been stellar.

Both are pretty good, but understand that there are too many variables involved and you can't really escape production hell indefinitely, regardless of what you pick. What changes is when you are going to see the flames, and what is going to spark them.

[+] amath|3 years ago|reply

Have you tried Pulsar or Redpanda? Both seem mature enough and provide decent performance to probably meet your needs. What I hear is that Redpanda is a lot easier to manage than Kafka.

[+] mikedelago|3 years ago|reply

Kafka (along with zookeeper) really isn't that bad to self-host.

Ime, it's easy if the org has a half decent infrastructure/configuration as code setup.

[+] physicles|3 years ago|reply

The blog post series seems to bury the lede -- it isn't until part 3 [1] that we get to the insight that MQTT and Kafka solve different problems and therefore can have complementary roles in the same system.

We use this architecture for IoT: MQTT for edge because it's standard and super good enough, and Kafka because it turns out we want to do more than one thing with the data as it streams in so not using Kafka would end up being more complicated.

Here's a key insight though: for IoT you don't want to use an actual MQTT broker, like Mosquitto or HiveMQ. If you do, it's hard to avoid data loss. Server side you have something that subscribes to those MQTT topics and pushes the data into Kafka. What do you do when that thing needs to be restarted? Ok, you can use persistent sessions in your MQTT broker. But how much memory does your MQTT broker need? What if your MQTT broker crashes? Oops, now your MQTT broker needs its own persistent database to keep track of all those messages in limbo.

What you want is an MQTT gateway -- something that looks like an MQTT broker to the devices, but the server side does something different with received messages. When it gets the MQTT PUBLISH command, it sends the message to Kafka, waits for the ack from Kafka, and only then sends PUBACK back. Presto, the MQTT thing is now stateless and horizontally scalable. Your clients just need some retry logic.

Maybe Kafka's mqtt-proxy does this, I don't know. I don't think it's mentioned in part 3. But it's a key property of such a system. I'm guessing Amazon's IoT gateway does this, because once you've thought about it hard enough it becomes obvious this is how it needs to work.

1: https://www.influxdata.com/blog/mqtt-vs-kafka-iot-advocates-...

[+] skrtskrt|3 years ago|reply

there’s a lot of “Kafka causes so many issues!” comments here.

I think it gets a bad rap because it gets introduced to orgs without the org having the requisite level of understanding. If your whole org is just on like a standard OLTP/OLAP setup, then suddenly there’s a Kafka queue, there’s going to he a serious learning curve and bumps along the way.

If you’re incorrectly putting async event brokers as the datastore where you should be putting a synchronous DB and then streaming from the DB to kafka with an outbox pattern, you’re going to have a bad time.

If you’re not modeling your queue depth and throughput you’re going to have a bad time.

If you’re not modeling your concurrency scenarios and synchronization, you’re going to have a bad time.

[+] kerblang|3 years ago|reply

Since nobody else mentioned it: Often you'll want to have multiple consumers of a given topic for load-balancing/failover - that's on the consumer side. The support for this in the MQTT standard is poorly defined, and popular libraries handle it poorly to the point that I can't recommend trying it. Message queues right-and-proper load-balance consumers by locking the entire topic once per delivery; Kafka does it by assigning "partitions" of a topic to different consumers, and thus you have faster, lock-free delivery (Kafka is essentially a "hot-rodded" message queue, just like hot-rodding a car by removing certain "proper" guard rails).

[+] ar9av|3 years ago|reply

I've been combing the interwebs and following countless tutorials for different types of IoT data gather solutions using various messaging services and brokers for an scale weighment system that I've been developing using VueJS (eventually looking migrate to Nuxt).

So far the concept is simple, a weighment scale has an RS232 (COM) port that streams data out for the tare, net, and gross weights using some kind of micro-controller architecture, Raspberry Pi's, Arduino, PyBoard etc. via some sort of messaging service.

Clients connect to a page where they can "pub/sub" to scales to see the current weight of specific scales and using a remote device. The page acts as a UI to either manage inventory, register consumed ingredients, or register completed tasks.

So far most of the development scripts and test environments I've built all seem to do basically the same thing and will gather the data that's needed but I'm curios if anyone has dealt with anything like this in the wild and if there are any caveats that I'm missing or other technologies that would be better suited for what I'm trying to achieve?

Right now Kafka or RabbitMQ seem to be my main choices for message brokers mostly because they are fairly easy to setup via Docker. If anyone has any recommendation on libraries I should look into that would be awesome! The UI is coming together nicely, I started it in React but switched to Vue3 after I fell in love with the component architecture and composition API.

[+] hkt|3 years ago|reply

A better comparison with Kafka is redis streams. Similar semantics, a fraction of the operational overhead.

[+] mrkeen|3 years ago|reply

I jumped onto https://mqtt.org/ to try to answer my usual use-case question about non-Kafka messaging, which is: "Do the messages get saved anywhere so you can come back and read them later?" Still not entirely sure about it.

But I did see:

    This is why MQTT has 3 defined quality of service levels: 0 - at most once, 1- at least once, 2 - exactly once

I'm a big fan of advertising the impossible on the front page.

[+] jon-wood|3 years ago|reply

MQTT isn’t designed as a persistent log, but can fulfil some of what you might want to use one for.

Each message has a couple of flags, the first being Quality of Service, which as you quoted above determines deliver guarantees. 0 is fire and forget, with potential loss of messages. 1 will queue messages for delivery to offline clients that are subscribed to a topic (within reason, all brokers set limits on that), and 2 is often described as “exactly once”, but is in fact just a more involved dance to acknowledge messages.

The other flag is a Retain flag, which instructs the broker to associate that message with the topic it was sent to, and send it on to any newly subscribing clients when they subscribe. This is good for use cases like remote device configuration - you can send it to a topic, setting the retain flag, and then when a device comes online it’ll immediately receive new configuration.

MQTT is great as a message queue for remote devices, mostly because it’s so lightweight anything with an IP stack can integrate with it, but I’m not sure why anyone would attempt to make it a piece of core infrastructure.

[+] avereveard|3 years ago|reply

> advertising the impossible

eh if you read the finer print it's just a deduplication id appended to every message. blog doesn't go into detail on what happen when two client pust a message with the same it, or what happens if there is more than one failure (i.e. client fails to detect a service outage and during the service outage the message is consumed by the broker but persisting fails) but in general the usage of a at least once + a deduplication id is not something revolutionary.

[+] alexisread|3 years ago|reply

With mqtt it depends on the broker, eg. Emitter.io can save them for a week etc. Offset for a client is usually stored on the broker, so if a client reconnects, all of the messages it has missed are forwarded to it.

As mentioned in other answers, service levels have a defined meaning, which is different to the absolute theoretical meaning, and really is to do with the message aks.

Really, as the article mentions, kafka and mqtt are for different purposes, with some overlap. Kafka is all about the log, whereas mqtt is about uncertain connections. A better comparison which I've yet to see, is comparing mqtt to nats.

Lastly, kafka is much easier to administer using redpanda, which doesn't have zookeeper, combines the registry and kafka connect (see WASM runners) with the runtime, and has a very nice console for debugging.

Similarly, Emitter.io does a great job with clustering for mqtt.

I'd like to see an open source kafka-mqtt bridge that worked in both directions as they all seem to go mqtt->kafka only.

[+] alecthomas|3 years ago|reply

> I'm a big fan of advertising the impossible on the front page.

Do you mean like Confluent do? https://www.confluent.io/blog/exactly-once-semantics-are-pos...

[+] jonquark|3 years ago|reply

MQTT.org can't answer that as it's a web page for for a protocol. I've worked on platforms that do have a historian feature but it will vary from broker to broker.

(disclosure, I work on Eclipse Amlen and it does not - but people often rig it up to a subscriber that funnels (some/all) messages into databases

[+] unknown|3 years ago|reply

[deleted]

[+] unknown|3 years ago|reply

[deleted]

[+] unknown|3 years ago|reply

[deleted]

[+] yawniek|3 years ago|reply

many people seem to not have clarity on what a distributed log is and in which architecture its useful and in which not. if you are abusing a distributed log as a message queue, you are most of the time creating a mess.

[+] JaggerFoo|3 years ago|reply

I designed a facility IoT system with AWS products: IoT Core (MQTT broker), SiteWise (analytic dashboard), S3 and Lambda. I found the AWS offerings to have everything I needed in one place with a low cost. Added benefits were being able support NIST cybersecurity requirements (CMMC v2, Level 2) in addition to the IoT system.

I was a fan of NATS and Kafka in the past, but AWS tooling makes IoT relatively easy.

Cheers

[+] jake_morrison|3 years ago|reply

I have done a lot of work with IoT-ish data, e.g. sending location and telemetry data from vehicles and remote sensors over unreliable cellular networks.

I need the ability to queue data on the device to deal with patchy connectivity and some policy for deleting messages in the queue. For example, I might throw away unsent location updates that are "stale" to save space. I might need to prioritize some messages, e.g. "lithium-ion battery pack overheating".

I may be running on embedded hardware that is too small to run Linux. The connection to the modem might be serial.

Data size and bandwidth usage can make a difference. I might get 2MB/month for $3. Bytes count if I want to send frequent updates to get more precision on the location.

I generally know exactly what messages I am sending, so using a compiled format like gRPC or COAP (https://en.wikipedia.org/wiki/Constrained_Application_Protoc...) can be better than JSON.

I may need to get through multiple layers of network address translation, making it different to send messages back. So keeping a persistent TCP connection can help. But using TCP for one-off connections wastes bytes, so UDP can be useful if I don't care about losing packets.

I may want to encrypt or digitally sign your messages.

So, I end up doing a lot of work to handle queueing locally, but using a pretty simple message-oriented binary protocol to send to the server. The server can do whatever it wants, e.g. write it to a Kafka queue.

Amazon's IoT framework nails a lot of these points. https://aws.amazon.com/blogs/compute/building-an-aws-iot-cor...

[+] jinmingjian|3 years ago|reply

MQTT and Kafka are different things. MQTT is not necessarily better or worse than Kafka, vice versa.

But, sadly, people are often caught in their own loops, when there are or can be created better options.

I have created a new, free, single binary data-service platform for IoT, JoinBase: https://joinbase.io/

This single binary data-service platform has proved:

1. Kafka is not more suitable for industry or higher performance than MQTT, even from the protocol level

With carefully crafting, JoinBase has saturated one PCIE 3.0 NVME sustained write bandwidth (25 million msg/s) in single modern node. This is, in fact, can not been done by the Java-based Kafka.

We have provided FREE full functionality community for testing: https://joinbase.io/products/

2. Use MQTT and Kafka together is unnecessary. This only makes your pipeline more complex, expensive, but much unstable and slower.

JoinBase can do arbitrary message preprocessing and auto-view(WIP). Streaming does not have to be owned through a separate monster.

3. High performance or ease of use, has nothing to do with the size of the software if the product can be properly engineered.

5MB Single binary JoinBase is enough to beat many monsters in the IoT/AIoT data pipleine: + sustained batch MQTT message write throughput: ~10x faster than Kafka and ~5x faster than that of one popular broker + basic SQL analytics: 3-4x faster than ClickHouse + HTTP interface concurrent queries: ~100x higher than ClickHouse + ... More could be seen in our 2022 summary blog: https://joinbase.io/blog/joinbase-2023/

There are historical reasons for all of these, of course. But it could be great that we break out of own mindset loops.

[+] mstaoru|3 years ago|reply

Most MQTT brokers are not great at storing data. The pattern that works the best for me in solving the "edge delivery" use case is a lightweight clustered MQTT broker (e.g. VerneMQ) with a little Lua script inside to push everything into Redis Streams (of course, clustered as well) immediately. Kafka is another alternative, but it's not that lightweight, operationally, and often Redis Streams are "good enough". With an LB in front of both Verne and Redis, this setup is pretty decent for IoT data ingress.

[+] gz5|3 years ago|reply

Good article (along with parts 2 and 3). Are there key differences in secure networking constructs (TLS, mTLS, VPN, whitelisted IPs, open ports, etc.) in the options described:

+ inbound to Kafka clusters and Kafka Connect?

+ inbound to Mosquitto MQTT broker?

+ inbound to Telegraf?

+ inbound to influxDB?

[+] Kinrany|3 years ago|reply

Is there a ZeroMQ-style protocol for a Kafka-like event log?

Messaging protocols are many, but can't provide the semantics necessary for async communication.

[+] Animats|3 years ago|reply

Controlling emergency generators via AWS is scary.

[+] amrx101|3 years ago|reply

Another MQTT vs Kafka. It's been re-iterated many times that MQTT and Kafka solve different problems.

113 comments