top | item 19927877

Hermes – A message broker built on top of Kafka

125 points| qluml | 6 years ago |allegro.tech | reply

54 comments

[+] lllr_finger|6 years ago|reply

Can someone help me understand the value proposition for Hermes? The only thing I can see is that it abstracts away producing to and consuming from Kafka. The use cases provided answer why you'd use a message broker system, but not why you'd want to do it over HTTP.

Edit: I understand HTTP is easier than Kafka, but is this something developers really struggle with when adopting Kafka? My experience is that they struggle with the nuances, behavior, and maintenance of Kafka/ZooKeeper more than anything.

I also didn't see how it dealt with concepts like exactly once delivery - any experiences in that area?

[+] thanatos_dem|6 years ago|reply

Exactly once delivery is not a thing, and confluent needs to stop openly lying to people about it. At least once delivery and idempotence is not the same, and has existed forever.

Calling it “exactly once” is marketing BS. It’s the same as Oracle claiming for years to support serializable transactions when they didn’t, except that one is technically possible, they just didn’t support it in actuality.

[+] Groxx|6 years ago|reply

re HTTP specifically: one major benefit I've seen to even "just" HTTP wrappers for systems is that the HTTP ecosystem is extremely mature, even on relatively exotic languages / platforms / coding patterns / design constraints / etc.

You want load balancing, context propagation, multiplexing, proxying, authentication, request tracing, [anything from a truly gigantic list, both in-code and around-your-system]? HTTP has it. Probably several. And they probably already work with everything you already have, and happily run unattended for years.

Kafka... might? Kafka for language X.... might? But probably not.

You want to extend Kafka to add X between Y and Z? Does the protocol even allow it? HTTP does, choose your flavor. Odds are even decent that a fair number of your engineers have already heard of or used it.

---

There are benefits to specialized protocols, absolutely. But there are also benefits to letting everything just use the same robust HTTP client as everything else.

[+] thibauts|6 years ago|reply

Exactly what I thought. It can alleviate the need to use flaky Kafka clients in some languages, but kind of disappointing in that it doesn't soften the main pain point of Kafka: operational and cognitive load.

[+] infecto|6 years ago|reply

I am not too familiar with Hermes but there is a lot of power exposing a http endpoint. I think you are thinking too inside of the box here. The benefit here is not adoption from developers or inability to understand how Kafka works with more native libs.

The value here is 1) publishing messages from more unique sources. Perhaps allowing your clients to publish messages. 2) You can enforce additional guarantees. Does the message conform to the a defined schema?

[+] napsterbr|6 years ago|reply

As someone evaluating Kafka for the first time, it would be useful to know what Hermes provides other than Kafka already does. After glancing at the homepage I see the REST api and the fact it is push based. Honestly I don't see how it would fit on my use case but interesting project nonetheless.

> exactly once delivery

Kafka is known to provide exactly once semantics - given your producers and consumers follow some rules, notably being idempotent. When ingesting from Kafka Stream API, it is actually exactly once delivery. There are a couple posts on confluent.io explaining how they achieve this (sorry, currently on mobile, can't copy-paste without having an outburst on how unusable touch devices are for me).

[+] guhcampos|6 years ago|reply

I see a bit of their reasoning. Taking from the article:

"When you have an environment with 20+ services, code sharing, maintenance and following updates become problematic. At Allegro we had the chance to find it out. It’s better to take out dependencies from business services as much as possible."

By adopting something like Hermes you are not really "taking out" a dependency, but abstracting it - as you said. Yet, by talking HTTP to the message broker you are abstracting away Kafka from your developers and your code. One less lib to depend on for each language you have in your architecture. One less version to control among your services, etc.

[+] xeronz|6 years ago|reply

I only skimmed the write up on it, but knowing kafka fairly well, the following could potentially work better than what is included by kafka: - push model (dependent on use case) - filtering - throttling/rate negotiation - exactly once (kafka out of the box does not dedupe on the broker)

[+] rswail|6 years ago|reply

Having just completed a project using Kafka as an event pipeline and a data store, one of the issues we found was that consumer polling takes a large chunk of resources.

Having a push model for consumption would certainly remove some of the complexity we had to deal with for scaling out consumption.

[+] ethnoe|6 years ago|reply

Hi there, I have been the technical team lead of Hermes team for ~four years, before Łukasz (the author of blogpost) took over. Thanks for taking time to read, think and write about our product :)

Our value proposition is built on four main aspects: * ease of integration * easier Kafka management * centralised management and validation * increased stability / reliability

Mind that some of the points don't make much sense unless you have a lot of services managed by a lot of independent teams. Thus Łukasz remark about "20+ microservices" in the original post. We run 700 microservices on prod managed by something close to 70 teams.

Ease of integration has been nicely summed up by others in this thread. HTTP tends to be the simplest way to integrate anything nowadays, at least in our case. While this comes at a cost, being able to get projects started up very quickly, without getting into details of proper handling of Kafka producer/consumer clients provided great value for us. Also while history might not be so important considering using Hermes in 2019 because Kafka matured, gained traction and recognisability among Software Engineers, it wasn't so easy to handle Kafka in 0.7/0.8 days when we started.

Of course switching to HTTP comes at a cost. I think the biggest one is using pure HTTP in push model. This makes it impossible to take advantage of Kafka data model, which guarantees event ordering at partition level. Zalando took a different approach with Nakadi (https://github.com/zalando/nakadi). I would say that at some point Hermes should consider following this path for more advanced users.

Easier Kafka management. Since we abstract away Kafka and hide it behind HTTP/REST API, we can easily introduce many changes to Kafka clusters. One of them was splitting Kafka cluster into two operate ones (one per our DC) without clients noticing. They were still publishing to same old Hermes instances, discovered via Consul. While doing it with clients might seem like a trivial thing to do when you have just a few services that use Kafka, with a few hundreds of clients it generates a lot of unnecessary work for developers.

Now whenever we need to do some maintenance with Kafka clusters (rebalance partitions, change cluster/hosts etc) we just route the traffic at Hermes level and no interaction with clients/developers is necessary.

Centralised management and validation. We started with publishing JSON. Along the way, as more and more people started consuming data offline (from Hadoop), it turned out that moving to some structured/schema based format is necessary, thus Avro. Hermes helped us a lot with this. It enables us to fail fast when someone starts publishing malformed requests for whatever reason, instead of relying on consumer (online and offline) to be hit and have to communicate with producer. Secondly support for Avro in JVM (our main microservice platform) is not that great and we put a lot effort into making it better (including publishing https://github.com/allegro/json-avro-converter). By having Hermes to do on-the-fly conversion for both publishers and subscribers we made it possible to only define schema and deal as little with Avro as possible in simple cases when it might not be beneficial for the team.

We also have Hermes integrated with our Service Catalog, so we can easily track ownership of topics and subscriptions. People publishing have easy access to information about who not only subscribes to online data, but also who accesses data offline (via Hadoop) using our offline clients feature. This way Hermes provides central place to manage our data streams.

Increased stability/reliability. This last one might be controversial, but in practice it did save us a few times. Mind, that I mean increased (more nines), not totally bulletproof. Kafka is a great, resilient piece of software. it is also complex and incidents happen. It might not even be that cluster is down - but increasing response times from few ms to 1second can be just as deadly. Hermes Frontend on the other hand is really simple. By putting it in front of Kafka together with built-in buffering support, we added a layer which increased our reliability. Now even if Kafka cluster has huge problems, we can accept incoming events for 2-3 hours, having time to either resolve the issue or reroute traffic to other cluster. This means that microservices don’t have to deal with data buffering on their own. Of course Hermes is still pretty much stateless by itself, so when traffic to Kafka flows normally, we can restart, spin up and spin down instances at will.

Entering danger zone: if both Kafka goes down and Hermes hosts blow up - the data is lost. This is a trade off and we are happy to say that for years running Hermes + Kafka on production, it never failed and saved us a few times.

I hope that I managed to clarify why we are using Hermes as main message bus powering our microservice architecture. We open sourced it, as we wanted to do our work in the open, sharing it with anyone who finds it useful and beneficial :)

[+] theomega|6 years ago|reply

This sounds very interesting. Did anyone get from the homepage what kinds of guarantees this offers? What if the HTTP endpoint where Hermes should push the data to is down? Does it retry? If yes, for how long?

[+] theomega|6 years ago|reply

Answering my own question (RTFM): https://hermes-pubsub.readthedocs.io/en/latest/user/subscrib...

You can configure how long it retries and with what strategy.

[+] SkyRocknRoll|6 years ago|reply

If any of you looking for message and streaming system under single system then pulsar.apache.org supports both and lot more reliable and scales better than kafka

[+] hestefisk|6 years ago|reply

Very nice with a simple wrapper. That said I’m wondering if 9 out of 10 use cases could do with something simpler, ie zeromq, which scales really well.

[+] unknown|6 years ago|reply

[deleted]

[+] twa927|6 years ago|reply

Can someone provide actual high-level use cases for using Kafka? Prefereably use-cases not handled by RabbitMQ.

I've seen a few talks about Kafka but they focused on the internals. My guess is that Kafka is for large systems for which managing a multi-node RabbitMQ cluster is too much trouble.

[+] josephg|6 years ago|reply

I’ve long had the inverse view - I’m not sure what good use cases there are for Rabbitmq that couldn’t be handled better by a Kafka cluster.

One company I worked with used Kafka as their central source of truth across the organisation. All events generated by users were thrown into a massive Kafka cluster. Each team in the organisation cared about a different view into that data (financials, marketing, fraud, what we display to that user on the website, etc). Each team would ingest the same kafka queue and do different things with it - often consuming certain events into their own Postgres instance, or other things like that.

I used Kafka when I made my reddit r/place clone a few years ago because it gives great read and write amplification. With Postgres as a central source of truth, you can only handle thousands of writes per second. And reads will slow down the instance. With Kafka you can handle about 2M/sec. And reads can really easily be serviced from other machines - you can just have a bunch of downstream Kafka instances consuming from the root, and serving your readers in turn.

It may be that you can also solve all these problems with a well configured rabbitmq cluster. But coming from a database world I find it more comfortable to reason about architecture, performance and correctness with Kafka.

[+] solidasparagus|6 years ago|reply

Kafka is a high-throughput, horizontally-scalable blob data store for data streams. The data store part of that is my favorite part.

You can use it as a simple message broker, but since it keeps the message history as a timeseries, you can also do things like run batch analysis jobs on the day's message or replay the last X hours of messages because your DB died and your backup is old.

It is a good way to decouple data producers and data consumer, particularly in an enterprise context - producers push to Kafka and anyone can consume that data, whether they are an operations team that wants a realtime data stream, a BI team that needs periodic data dumps, or a team that wants a long-term audit trail (the duration of the history is going to depend on your scale, but for many users a long history is realistic).

Kafka also has a nice ecosystem including streaming analytics (KSQL), clients that make reading from Kafka easily horizontally scalable (have many machines acting as a single client, automatically rebalancing if one of those machines dies), exactly once processing and probably more since I last worked with it.

I'm not familiar enough with RabbitMQ to say how it compares to Kafka, but I haven't found a use case yet where Kafka isn't a good choice (except for the 'I need to set up a message broker quick and painlessly' use case because it is not a particularly fun technology to manage yourself)

[+] chrisjc|6 years ago|reply

This is worth a read.

https://engineering.linkedin.com/distributed-systems/log-wha...

[+] mancini0|6 years ago|reply

A financial exchange - order messages are routed to Kafka and partitioned by the instruments symbol, match engines associated with a given set of symbols consume from their assigned partition. When a match-engine goes down it can reconstruct the order book by replaying from a given offset.

[+] unknown|6 years ago|reply

[deleted]

[+] ww520|6 years ago|reply

It's basically a high speed transaction log, persistent, distributed, easily scalable, that happens to store messages to do messaging brokering very well.

[+] DiseasedBadger|6 years ago|reply

Increasingly, all technology news sounds like:

"X: a blazingly fast X built on top of {something I vaguely thought did X}"

[+] dang|6 years ago|reply

"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

https://news.ycombinator.com/newsguidelines.html

[+] colechristensen|6 years ago|reply

Hermes seems to wrap Kafka with HTTP to make integrating it simpler (and I think less robust).

[+] ravenstine|6 years ago|reply

Don't forget "modern".

[+] exabrial|6 years ago|reply

Ironically: Hermes was the name of a JMS user interface/testing tool.

[+] codeduck|6 years ago|reply

It's also the name of a hilariously bad courier in the UK - so bad in fact that most people call it Herpes instead.

[+] fwip|6 years ago|reply

I think it's more a case of the same inspiration (Hermes, the Greek messenger of the gods) than irony.

[+] RickJWagner|6 years ago|reply

My thoughts exactly. (I immediately remembered the old-time JMS UI. I think it was on SourceForge.)