(no title)
TheHydroImpulse | 8 years ago
NSQ has served us pretty well but long term persistence has been a massive concern to us. If any of our NSQ nodes go down it's a big problem.
Kafka has been far more complicated to operate in production and developing against it requires more thought than NSQ (where you can just consume from a topic/channel, ack the message and be done). More to that, if you want more capacity you can just scale up your services and be done. With Kafka we had to plan how many partitions we needed and autoscaling has become a bit trickier.
We now have critical services running against Kafka and started moving our whole pipeline to it as well. It's a slow process but we're getting there.
We've had to build some tooling to operate Kafka and ramp up everyone else on how to use it. To be fair, we've also had to build tooling for NSQ, specifically nsq-lookup to allow us to scale up.
We have an nsq-go library that we use in production along with some tooling: https://github.com/segmentio/nsq-go
doh|8 years ago
Could you comment on particular problems and challenges that you ran into?
For the context, we're currently sending around 60k messages/sec and around 1k of them contains data larger than 10kb.
TheHydroImpulse|8 years ago
If you can get away with using PubSub or the like it would be far easier than to manage your own Kafka deployment (correctly).
If data loss is unacceptable then Kafka is basically the only open-source solution that is known for not losing data (if done correctly of course). NSQ was great but lacked durability and replication. We can guarantee that two or more Kafka brokers persisted the message before moving on. With NSQ, if one of our instances died it was a big problem.
Managing Kafka in a cloud environment hasn't been easy and required a lot of investment and we have yet to move everything over to it.
bstahlhood|8 years ago
https://nats.io/documentation/streaming/nats-streaming-intro...
kasey_junk|8 years ago
molszanski|8 years ago
doh|8 years ago
Must say that the stability is great, even with larger payloads (over 10MB in size). We're running it in production for couple of weeks now and haven't had any issues. The main limitation is that there is no federation and massive clustering. You can have a pretty robust cluster, but each node can only forward once, which is limiting.
ah-|8 years ago
TheHydroImpulse|8 years ago
Now we run Kafka through ECS and wrote some tooling to manage rolling the cluster and replacing brokers. krollout(1) (currently private) basically prevents partitions from becoming unavailable while rolling.
Now that multiple teams are using Kakfa we started exploring how to scale up. Each team may have different requirements and isolation can become an issue. Likely more tooling will need to be built around this.