Jepsen: Redpanda 21.10.1

[+] doctor_eval|3 years ago|reply

I just came here to thank Jepsen for these amazing reports. What a wonderful way to use your intellect to contribute to the wellbeing of the entire community.

Also I wanted to say to redpanda: I was on the fence but now I’m convinced. Will definitely be deploying on my next project, which has already kicked off. I only wish I could run it natively on MacOS instead of requiring docker.

[+] rystsov|3 years ago|reply

Hey folks, I was working with Kyle Kingsbury on this report from the Redpanda side and I'm happy to help if you have questions

[+] cgaebel|3 years ago|reply

Thanks for working with Jespen. Being willing to subject your product to their testing is a huge boon for Redpanda's credibility.

I have two questions:

1. How surprising were the bugs that Jepsen found?

2. Besides the obvious regression tests for bugs that Jepsen found, how did this report change Redpanda's overall approach to testing? Were there classes of tests missing?

[+] polio|3 years ago|reply

A complete nit, but the testimonial from the CTO of The Seventh Sense on https://redpanda.com/ spells Redpanda as "Redpand".

[+] mandevil|3 years ago|reply

I was unfamiliar with Redpanda, and now I know and trust it. Whatever marketing budget Redpanda spent to get a Jepsen report was well worth it.

[+] divan|3 years ago|reply

I happened to know RedPanda founder back in the days he was at Concord.io (as a founder and a main dev). The level of obsession with performance and optimization of this guy was insane. He's not only extremelly skilled with C++, but also very passionate about rethinking large and complex systems and rebuilding them to enable 10-100x speed improvements. It's like his personal hobby – take a piece of software everyone use, and optimize it to the limits of physics, usually by implementing better version from scratch himself :) Plus, he's an excellent communicator. Watching their team working I was always thinking that successful companies can be built only with that level of passion and expertise in a single package.

[+] belter|3 years ago|reply

Agree. Nowadays, I see anything that did not go through Jepsen with suspicion. Forces me to do the triple of technical due diligence.

[+] debarshri|3 years ago|reply

Thing that got my attention was that it has inline transform functions that can be added as wasm binary

[+] hardwaresofton|3 years ago|reply

One of the clearest indications prices for a service should be raised I’ve ever seen.

Can we get patio11 in here to say the thing?

[+] titanomachy|3 years ago|reply

The level of intellectual discipline and competence on display here is inspiring.

I'd love to take one of the Jepsen courses, but it seems they're offered only as corporate training. Maybe my employeer will agree to bring them in.

For now I'll have to satisfy myself with the YouTube videos.

[+] aphyr|3 years ago|reply

There's been a lot of interest in this, and I keep meaning to put together an open-to-the-public class when I'm less swamped! Might get a chance to do this shortly--I'll post on https://groups.google.com/a/jepsen.io/g/announce when it happens.

[+] CJefferson|3 years ago|reply

This isn't anything against Redpanda, but I'm always amazed how badly all these distributed databases do in Jepsen.

What would one use them for in practice, which wouldn't be better suitable by a (the thing I've used), say postgresql and streaming replication in case the server goes down? (I'm not suggesting there isn't a good application, just I'm not knowledgeable enough to know of one).

[+] jandrewrogers|3 years ago|reply

When a distributed database is designed, you must navigate and optimize several complex technical tradeoffs to meet the architecture and product objectives. The specific set of tradeoffs made -- and they are different for every platform -- will determine the kinds of data models and workloads that the database will be suitable for, especially if performance and scalability are critical as in this case.

The reason distributed databases tend to be buggy, especially in the first iterations, is straightforward if not simple to address. While it is convenient to describe technical design tradeoffs as a set of discrete, independent things, in real implementation they are all interconnected in subtle, complex, nuanced ways. Modifying one design tradeoff in code can have unanticipated consequences for other intended tradeoffs. In other words, there isn't a set of simple tradeoffs, there is a single extremely high-dimensionality tradeoff that is being optimized. Not only are complex high-dimensionality design elements difficult to reason about when writing code the first time, any changes to the code may shift how the tradeoffs interact in non-obvious ways. Humans have finite cognitive budgets, so unless it is obvious that a code change has the potential to have unintended side effects, we generally don't spend the time to fully verify this fact.

I can't tell you how many times I've seen tiny innocuous code changes alter the behavior of distributed databases in surprising ways. This is also why once the core code seems to be correct, people are reluctant to modify it if that can be avoided at all.

[+] georgelyon|3 years ago|reply

I'm constantly surprised more folks don't use FoundationDB, I'm pretty sure the Jepsen folks said something to the tune of the way FoundationDB is tested is far beyond what Jepsen does (Good talk on FDB testing: https://www.youtube.com/watch?v=4fFDFbi3toc).

My read is that most use cases just need something that works _enough_ at scale that the product doesn't fall over and any issues introduced by such bugs can be addressed manually (i.e. through customer support, or just sufficient ad-hoc error handling). Couple that with the investment some of these databases have put into onboarding and developer-acquisition, and you have something that can be quite compelling even compared to something which is fundamentally more correct.

[+] rystsov|3 years ago|reply

Different systems solve different problems and have different functional characteristics. Actually one of the thing which Kyle highlighted in his report is write cycles (G0 anomaly), it isn't a problem of the Redpanda implementation but a fundamental property of the Kafka protocol. Records in Kafka protocol don't have preconditions and they don't overwrite each other (unlike the database operations) so it doesn't make sense to enforce order on the transactions and it's possible to run them in parallel. It gives enormous performance benefits and doesn't compromise safety.

[+] claytonjy|3 years ago|reply

There's a lot of different ways to answer this, but I think about it as a different architectural paradigm. Yes you can do stream-ish things with Postgres but at some level of scale you'd be putting a square peg in a round hole.

What opened my eyes to this world is this post from Martin Kleppman on turning the database inside out: https://martin.kleppmann.com/2015/03/04/turning-the-database...

[+] Too|3 years ago|reply

CAP-theorem and configuration. When choosing a distributed database you are in a way already by definition giving up a chunk of of the C, Consistency.

Once you have a distributed database, they often have a myriad of tuning parameters that all impact in which corner of the CAP triangle you want to be. Using this you have to choose what risks you are willing to accept. If all my replicas are all in the same rack, any timeout-issues found would often just be academical and I can pragmatically design such a system very differently, vs if they are in different parts of the globe.

I might also have such high influx of low value data that I can accept some losses, the cost of a total deadlock would be more than just one transaction lost in cyberspace, not everyone is building a bank. That said, it still sucks a lot to have inconsistent data regardless of your application, so in such cases aim for lost data rather than wrong data. So in theory, the distributed approach is consider wrong, but in practice it might just be good enough.

This also ties into how you model your data, a lot of the faults found in the latest mongodb analysis was around multi-document transactions, but nobody uses mongodb this way, mostly it’s just a place where you dump standalone documents into it.

In the end, you have to go back to original question of why you are choosing a distributed DB in the first place, is it for scale, HA, regionality or other reasons. Then design around that, it’s never a silver bullet fix all solution.

Taking your example of streaming replication. How would that behave if primary acked to the client and then crashed, before the replica received the transaction? The alternative of waiting for the replica before you ack instead gives you 2 sources of failure. You’ve now reinvented a distributed system and are now in the same soup as all these other databases, just with other tradeoffs. :)

[+] agallego|3 years ago|reply

totally different approaches tho. people have tried what you proposed many times before and for some scale succeeded. hard to compare at all when you dig into the details.

expect a companion post. this was super fun to partner with kyle on this. +1 would recommend to anyone building a storage system.

[+] gigatexal|3 years ago|reply

If your DB doesn't pass the Jepsen tests it's not worth using. Kudos to both teams.

[+] antonmry|3 years ago|reply

This report seems to have some wrong insights. Auto-commit offsets doesn't imply dataloss if records are processed synchronously. This is the safest way to test Kafka instead of commit offsets manually

[+] rystsov|3 years ago|reply

Can you clarify what you mean? AFAIK with manual commit you have the most control over when the commit happens

Look at this blog post describing a data loss caused by auto-commit: https://newrelic.com/blog/best-practices/kafka-consumer-conf...

Also there also may be more subtle issues with auto-commit: https://github.com/edenhill/librdkafka/issues/2782

[+] relay23|3 years ago|reply

Almost nobody auto commits offsets in real applications though. If you do then you should really stop :)

[+] northstar702|3 years ago|reply

The blog for a deeper dive into results, fixes, and discussion on the write behavior in the kafka protocol. https://redpanda.com/blog/redpanda-official-jepsen-report-an...

[+] doommius|3 years ago|reply

Always great to read this. I preformed a jenkins test on Microsoft internal infra and it's a huge insight. From an academic side it's just as interesting looking into the lack of standards within consistently and the definitions of them.

[+] rystsov|3 years ago|reply

Cool! What did you test? I've played with Jepsen and Cosmos DB when I was at Microsoft but we had to ditch ssh, write custom agent and inject faults with PowerShell command lets.

[+] dstroot|3 years ago|reply

> A KafkaConsumer, by contrast, will happily connect to a jar of applesauce14 and return successful, empty result sets for every call to consumer.poll. This makes it surprisingly difficult to tell the difference between “everything is fine and I’m up to date” versus “the cluster is on fire”, and led to significant confusion in our tests.

This tickled my funny bone. Never expected humor in a Jepsen writeup. Kudos!

[+] staticassertion|3 years ago|reply

> Never expected humor in a Jepsen writeup

Jepsen reports are often pretty funny, some famously so

[+] cwillu|3 years ago|reply

Wait until you find out why it's called “Jepson”

[+] dhshshdhdgfff|3 years ago|reply

The first half is jepsen team trying to divine some actual testable guarantees from a pile of blog posts and a random Google doc. What a mess.

[+] rystsov|3 years ago|reply

The mess is mostly the result of the mismatch between the classic database transactional model and kafka transactional model (G0 anomaly). If you read the documentation without the database background it seems ok, but when you notice the differences between the models it becomes hard to understand if it's a bug or property of the Kafka protocol.

There is a lot of research happening around this area even in the database world. The list of the isolation levels isn't final and some of the recent developments include PC-PSI and NMSI which also seem to "violate" the order. I hope one day we get the formal academic description of the Kafka model. It looks very promising.

[+] jeffbee|3 years ago|reply

Total mess. It’s a real indictment of Kafka, more than it is anything about redpanda in the first half.

[+] excuses_|3 years ago|reply

I wonder if Redpanda thinks about or offers some alternative protocol that would be better defined in terms of transaction guarantees. At this point it looks like Kafka’s protocol was a nice try but it needs a major refactoring.

[+] unknown|3 years ago|reply

[deleted]

[+] newman314|3 years ago|reply

Redpanda (back when they were VectorizedIO) spammed my work email after I starred one of their repos, denied it after I called them out on it and I just noticed that they had deleted their response to me.

Pretty sneaky to go back and delete the tweets first denying and then apologizing.

Receipts: https://twitter.com/d11cc3s/status/1447573471152656389 https://twitter.com/d11cc3s/status/1450906855115354116

[+] agallego|3 years ago|reply

hi newman314 - i mentioned in the tweet this was a mistake and offered an apology there, an sdr reached out to you, when i realized that i apologize. no ill intent. feel free to test this with a fake github account. my tweets automatically delete after 6mo, all of them on a rolling window. nothing special about this interaction. there is no sneaky-ness, though feel free to disagree.

[+] staticassertion|3 years ago|reply

Sounds like you have a personal, singular issue with them that I can't imagine anyone else cares about.

59 comments