top | item 6326769

(no title)

krka | 12 years ago

Well, some DB's "load" the value in some way before giving it to the user, so that time is implicitly measured for those types of DB's but not for others, so I don't think it's a particularly fair comparison. I think Tokyo Cabinet gives you a pointer to newly allocated memory, at least for compressed data (but I am not completely sure about this). Like LMDB, Sparkey also does no processing of the value for uncompressed data, but for compressed data some decompression needs to take place in the iterator buffer (I guess that's equivalent to your cursor object). Even worse, if this is done lazily upon value retrieval, the cost is completely hidden in the benchmark.

In any case, I think the easiest way to get a fair benchmark is to at least iterate over the value, possibly also compare it. If that time turns out to be significant (perhaps even dominant) compared to the actual lookup time, then further optimization of the actual storage layer is pretty meaningless.

discuss

hyc_symas|12 years ago

Have a look https://github.com/hyc/sparkey/tree/master/src bench.c, bench_mdb.c bench.out, bench_mdb.out

This was run on my Dell M4400 laptop, Intel Q9300 2.53GHz quadcore CPU, 8GB RAM. The maximum DB size is around 4GB so this is a purely in-memory test. Your hash lookup is faster than the B+tree, but with compression you lose the advantage.

krka|12 years ago

Thanks, that's interesting data!

I am not sure why you changed the key format to "key_%09d" - is that an optimization for lmdb, to make sure the insertion order is the same as the internal tree ordering? If so, why is that needed for the benchmark?

I noticed that the wall time and cpu time for the sparkey 100M benchmarks were a bit disjoint, it would seem that your OS was evicting many pages or was stalling on disk writes. The Sparkey files were slightly larger than 4 GB while lmdb was slightly smaller, but I am not sure that really explains it on an 8 GB machine.

I am not sure I agree about the non-linear creation time difference, the benchmarks indicate that both sparkey and lmdb are non-linear. The sparkey creation throughput went from 1206357.25 to 1109604.25 (-8.0%) while lmdb's went from 2137678.50 to 2033329.88 (-4.8%)

Regarding the lookup performance "dropping off a cliff", I think that is related to the large difference in wall time vs cpu time, which indicates a lot of page cache misses.

lmdb seems really interesting for large data sets, but I think it's optimized for different use cases. I'd be curious to see how it behaves with more randomized keys and insertion order. I didn't think of doing that in the benchmark since sparkey isn't really affected by it, but it makes sense for when benchmarking a b-tree implementation.

Sparkey is optimized for our use case where we mlock the entire index file to guarantee cache hits, and possibly also mlock the log file, depending on how large it is.

The way you append stuff to sparkey (first fill up a log, then build a hash table as a finalization) is really useful when you need to use lots of memory while building and can't affort random seek file operations, and in the end when most of the work is done and your memory is free again, finalize the database. Of course, you could do the same thing with lmdb, first writing a log and then converting that into a lmdb file.

Thanks for taking the time to adapt the benchmark code to lmdb, it's been very interesting.

hyc_symas|12 years ago

Other interesting details from the results: LMDB's creation time is always faster. LMDB's creation time is linear, Sparkey's is nonlinear. For a 10x larger DB, Sparkey takes more than 10x longer time to create.

Sparkey's lookup performance drops off a cliff at 100M elements. This doesn't seem to be related to raw size because it occurs regardless of compression. LMDB's performance degrades logarithmically, as expected of an O(logN) algorithm.

Hashing is inherently cache-unfriendly, and hashes are inherently wasteful - hash tables only perform well when they're mostly empty. They're completely hopeless when scaling to large datasets.

hyc_symas|12 years ago

and just for curiosity's sake, benchi.c, bench_mdbi.c, benchi.out, bench_mdbi.out using integer keys instead of strings.

hyc_symas|12 years ago

And the point of LMDB is that it's zero-copy, there is no wasted time "loading" values.