(no title)
krka | 12 years ago
On monday I added some slightly more proper benchmark code, you can find it on https://github.com/spotify/sparkey/blob/master/src/bench.c
I didn't add the level-db code to this benchmark however, since I 1) didn't want to manage that dependency 2) didn't know how to write optimized code for usage of it.
I'm using very small records, a couple of bytes of key and value. The insert order is strictly increasing (key_0, key_1, ...), though that doesn't really matter for sparkey since it uses a hash for lookup instead of ordered lists or trees.
As for the symas mdb microbench, I only looked at it briefly but it seems like it's not actually reading the value it's fetching, only doing the lookup of where it actually is, is that correct?
"MDB's zero-memcpy reads mean its read rate is essentially independent of the size of the data items being fetched; it is only affected by the total number of keys in the database."
Doing a lookup and not using the values seems like a very unrealistic usecase.
Here's the part of the benchmark I'm referring to: for (int i = 0; i < reads_; i++) { const int k = rand_.Next() % reads_; key.mv_size = snprintf(ckey, sizeof(ckey), "%016d", k); mdb_cursor_get(cursor, &key, &data, MDB_SET); FinishedSingleOp(); }
hyc_symas|12 years ago
http://symas.com/mdb/memcache/ and http://symas.com/mdb/hyperdex/ give results where the records are transmitted across the network.
krka|12 years ago
In any case, I think the easiest way to get a fair benchmark is to at least iterate over the value, possibly also compare it. If that time turns out to be significant (perhaps even dominant) compared to the actual lookup time, then further optimization of the actual storage layer is pretty meaningless.