top | item 43392881

(no title)

mehrant | 11 months ago

for the time being, have a look at this please: http://hpkv.io/videos/performance_local.webm

this is 1M records, 3M operations on a single node, single thread, recorded in real time (1x).

I understand that without access to the source of test program it's hard to trust, but we can arrange that if you decided to take on that call :)

discuss

pclmulqdq|11 months ago

The question from most of us isn't "did you get that number," it's "what does that number actually mean?" Writes don't need to return any data, so you can sort of set that latency number arbitrarily by changing the meaning of "write done." I can make "redis with 0 write latency" by returning a "write done" immediately after the packet lands, but then the meaning of "write done" is effectively nil.

In every persistent database, that number indicates that an entry was written to a persistent write-ahead log and that the written value will stay around if the machine crashes immediately after the write. Clearly you don't do this because it's impossible to do in 600 ns. For most of the non-persistent databases (eg redis, memcached), write latency is about how long it takes for something to enter the main data structure and become globally readable. Usually, "write done" also means that the key is globally readable with no extra performance cost (ie it was not just dumped into a write-ahead log in memory and then returned).

In a world where you spoke about the product more credulously or where code was open-source, I might accept that this was the case. As it stands, it looks like:

1. This was your "marketing gimmick" number that you are trying to sell (every database that isn't postgres has one).

2. You got it primarily by compromising on the meaning of "write done," and not on the basis of good engineering.

mehrant|11 months ago

Thank you for your thoughtful critique.

To clarify what our numbers actually mean and address your main question of "what does that number actually mean":

1- The 600ns figure represents precisely what you described - an in-memory "write done" where memory structures are updated and the data becomes globally readable to all processes. This is indeed comparable to what Redis without persistence or memcached provides. Even at this comparable measurement basis (which isn't our marketing gimmick, but the same standard used by in-memory stores), we're still 2-6x faster than Redis depending on access patterns.

For full persistence guarantees, our mean latency increases to 2582ns per record (600ns in-memory operation + 1982ns disk commit) for our benchmark scenario with 1M records and 100-byte values. This represents the complete durability cycle. This needs to be compared with for example Redis with AOF enabled.

2- I agree that the meaning of "write done" requires clear context. We've been focusing on the in-memory performance advantages in our communications without adequately distinguishing between in-memory and persistence guarantees.

We weren't trying to hide the disk persistence number, we simply used "write done" because in our comparison we compared with Redis without persistence. but mentioning the persistence made an understandable confusion. that was bad on our part.

Based on your feedback, we'll update our documentation to provide more precise metrics that clearly separate these operational phases and their respective guarantees.

UPDATE:

clarification on mean disk write measurement:

the mean value is calculated from the total time of flushing the whole write buffer (parallel processing depending on the number of available cpu cores) divided by the number of records. so the total time for processing and writing 1M records as described above was 1982ms which makes the mean write time for each record 1982ns.