top | item 43838056

(no title)

cr3ative | 10 months ago

This is in such a thick academic style that it is difficult to follow what the problem actually might be and how it would impact someone. This style of writing serves mostly to remind me that I am not a part of the world that writes like this, which makes me a little sad.

discuss

glutamate|10 months ago

In the beginning, when you read papers like this, it can be hard work. You can either give up or put some effort in to try to understand it. Maybe look at some of the other Jepsen reports, some may be easier. Or perhaps an introductory CS textbook. With practice and patience it will become easier to read and eventually write like this.

You may not be part of that world now, but you can be some day.

EDIT: forgot to say, i had to read 6 or 7 books on Bayesian statistics before i understood the most basic concepts. A few years later i wrote a compiler for a statistical programming language.

cr3ative|10 months ago

I’ll look to do so, and appreciate your pointers. Thank you for being kind!

concerndc1tizen|10 months ago

The state of the art is always advancing, which greatly increases the burden of starting from first principles.

I somewhat feel that there was a generation that had it easier, because they were pioneers in a new field, allowing them to become experts quickly, while improving year-on-year, being paid well in the process, and having great network and exposure.

Of course, it can be done, but we should at least acknowledge that sometimes the industry is unforgiving and simply doesn't have on-ramps except for the privileged few.

unknown|10 months ago

[deleted]

jorams|10 months ago

It uses a lot of very specific terminology, but the linked pages like the one on "G-nonadjacent" do a lot to clear up what it all means. It is a lot of reading.

Essentially: The configuration claims "Snapshot Isolation", which means every transaction looks like it operates on a consistent snapshot of the entire database at its starting timestamp. All transactions starting after a transaction commits will see the changes made by the transaction. Jepsen finds that the snapshot a transaction sees doesn't always contain everything that was committed before its starting timestamp. Transactions A an B can both commit their changes, then transactions C and D can start with C only seeing the change made by A and D only seeing the change made by B.

deathanatos|10 months ago

I empathize with the feeling of this being dense and unapproachable; I remember when I was first approaching these posts, and feeling the same.

For this particular one, the graph under "Results" is the most approachable portion, I think. (Don't skip the top two sections, though … and they're so short.) In the graph, each line is a transaction, and read them left-to-right.

Hopefully I get this right, though if I do not, I'm sure someone will correct me. Our database is a set of ordered lists of integers. Something like,

  CREATE TABLE test (
    id int primary key,
    -- (but specifically, this next column holds a list of ints, e.g.,
    --  a value might be, '1,8,11'; the list of ints is a comma separated
    --  string.)
    list text not null
  );

The first transaction:

  a 89 9

This is shorthand; means "(a)ppend to list #89 the integer 9" (in SQL, crudely this is perhaps something like

  UPDATE test SET list = CONCAT(list, ',9') WHERE id = 89;

… though we'd need to handle the case where the list doesn't exist yet, turning it into an `INSERT … ON CONFLICT … DO UPDATE …`, so it would get gnarlier.[2]); the next:

  r 90 nil    # read list 90; the result is nil
  r 89 [4 9]  # read list 89; the result is [4, 9]
  r 90 nil    # read line 90; the result is (still) nil

I assume you can `SELECT` ;) That should provide sufficient syntax for one to understand the remainder.

The arrows indicate the dependencies; if you click "read-write dependencies"[1], that page explains it.

Our first transaction appends 9 to list 89. Our second transaction reads that same list, and sees that same 9, thus, it must start after the first transaction has committed. The remaining arrows form similar dependencies, and once you take them all into account, they form a cycle; this should feel problematic. It's that they're in a cycle, which snapshot isolation does not permit, so we've observed a contradiction in the system: these cannot be obeying snapshot isolation. (This is what "To understand why this cycle is illegal…" gets at; it is fairly straightforward. T₁ is the first row in the graph, T₂ the second, so forth. But it is only straight-forward once you've understood the graph, I think.)

> This is in such a thick academic style that it is difficult to follow what the problem actually might be and how it would impact someone.

I think a lot of this is because it is written with precision, and that precision requires a lot of academic terminology.

Some of it is just syntax peculiar to Jepsen, which I think comes from Clojure, which I think most of us (myself included) are just not familiar with. Hence why I used SQL and comma-sep'd lists in my commentary above; that is likely more widely read. It's a bit rough when you first approach it, but once you get the notation, the payoff is worth it, I guess.

More generally, I think once you grasp the graph syntax & simple operations used here, it becomes easier to read other posts, since they're mostly graphs of transactions that, taken together, make no logical sense at all. Yet they happened!

> This style of writing serves mostly to remind me that I am not a part of the world that writes like this, which makes me a little sad.

I think Jepsen posts, with a little effort, are approachable. This post is a good starter post; normally I'd say Jepsen posts tend to inject faults into the system, as we're testing if the guarantees of the system hold up under stress. This one has no fault injection, though, so it's a bit simpler.

Beware though, that if you learn to read these, that you'll never trust a database again.

[1]: https://jepsen.io/consistency/dependencies

[2]: I think this is it? https://github.com/jepsen-io/postgres/blob/225203dd64ad5e5e4... — but this is pushing the limits of my own understanding.

mdaniel|10 months ago

> Beware though, that if you learn to read these, that you'll never trust a database again.

I chuckled, but (while I don't have links to offer) I could have sworn that there were some of them which actually passed, and a handful of others that took the report to heart and fixed the bugs. I am similarly recalling that a product showed up to their Show HN or Launch HN with a Jepsen in hand, which I was especially in awe of the maturity of that (assuming, of course, I'm not hallucinating such a thing)

joevandyk|10 months ago

[deleted]

rezonant|10 months ago

Posting ChatGPT outputs directly in a post with no attribution or indication that you are doing so is not helpful or authentic.

Sesse__|10 months ago

Hello ChatGPT.

senderista|10 months ago

Great summary, could you share the prompt you used?

belter|10 months ago

Please remove this LLM generated post

bananapub|10 months ago

posting this sort of LLM-generated garbage should get a ban.

have some respect for yourself and everyone else, christ.

ZYbCRq22HbJ2y7|10 months ago

> such a thick academic style

Why? Because it has variables and a graph?

What sort of education background do you have?

renewiltord|10 months ago

It's maximal information communication. Use LLM to distill to your own knowledge level. It is trivial with modern LLM. Very good output in general.

benatkin|10 months ago

It addresses the reader no matter how knowledgeable they are. It's a very good use of hypertext, making it so that a knowledgeable reader won't need to skip over much.

vlovich123|10 months ago

Have you tried using an LLM? I’ve found good results getting at the underlying concepts and building a mental model that works for me that way. It makes domain expertise - that often has unique terminology for concepts you already know or at least know without a specific name - more easily accessible after a little bit of a QA round.

vlovich123|10 months ago

Lots of downvotes with no actual explanation of what the issue is my suggestion.

I’ve repeatedly used ChatGPT and Claude to help me understand papers and to cut through the verbiage to the underlying concepts.