top | item 31766604

(no title)

samhw | 3 years ago

Yeah, I 100% agree that many of those problems are inherent to distributed databases. There are an interesting few which kinda straddle the line in that respect – stuff like counters and the aforementioned individually tunable consistency, where it makes it too easy for (in practice) individual engineers to trigger classic dist-sys failure modes – but largely its problems are the problems of distributed systems. Lots of its other problems are the problems of using an over-complex eventually-consistent write-optimal (etc) distributed system for a problem that doesn't require it (where e.g. Redis Cluster would be far better). I'd submit to throw maybe a few on top: it feels like a general theme of many incidents we encountered was around "we did something complex/accidentally-pathological and Cassandra froze up entirely due to [consistency / compaction / repair / GC] stuff". It did feel from many of those issues like it was a victim of its own complexity, more than anything else. (That theme also applies to lots of the 'operator errors'.)

Also, sorry, I think I was a bit unfair to Java. I'm not an anti-GC militant. I'd be the first person to point out the haziness of the distinction between tracing and (say) reference counting in the first place, or indeed with the indexing/defragmentation/etc space + work required of a malloc implementation. I'd consider a GCed language like Go - though I personally hate it and feel it utterly joyless - to be an improvement. It's more about the inherent complexity of adding a p-code machine like the JVM on top of the already-colossal complexity of a modern database. For what it's worth, for clarity, I've barely written any Java and I'm intimately unfamiliar with Java development, and despite giving my opinion I'm well aware it's not a very informed one. I do agree with your point about its isolating the 'unit' of your software from the particularities of any given hardware and making it more easily jepsenable - I hadn't considered that. And some of the stuff happening in the Java space, like Graal and (as you say) Loom, is very impressive.

I'll amend my original comment to make it a bit clearer that most of this is not really Cassandra's fault, and its faults aren't really more numerous and more severe than those of any other database. It's evidently a huge success and its value to people is undeniable - I don't mean to depreciate your work. I forget that there's a non-negligible chance of relevant people reading my comments on here (like, less congenially, the time I accidentally summoned Br*ndan E*ch: https://news.ycombinator.com/item?id=28792436). I don't work with Cassandra any more, so I'm probably unlikely to have many practical questions, but thanks for the offer and I'll certainly reach out if I find myself in that space again! Really appreciate your being so magnanimous about my not-very-magnanimous (pusillanimous?) comment.

discuss

_benedict|3 years ago

Please, no need to apologise! Your criticisms were all entirely well founded and the pain points you mentioned very real. I thought you expressed them considerately (I have seen plenty of vents that did not). I may reflexively defend Cassandra, but healthy and honest discussions around these things is great IMO.

> It did feel from many of those issues like it was a victim of its own complexity

I think there’s some truth in this, but I think the bigger problem was failing to give this complexity its proper respect (which would have been very costly and slowed feature development - perhaps consigning Cassandra to an also-ran position like others in the space, who knows?)