top | item 15409757

Modern B-Tree Techniques (2011)

314 points| pointy_hat | 8 years ago |citeseerx.ist.psu.edu

11 comments

order

cryptonector|8 years ago

> Probably the strongest arguments for B-trees over hash indexes pertain to multi-field indexes and to nonuniform distributions of key values. A hash index on multiple fields requires search keys for all those fields such that a hash value can be calculated. A B-tree index, on the other hand, can efficiently support exact-match queries for a prefix of the index key, i.e., any number of leading index fields

Yes, well, if none of the columns/fields in the index are sufficiently selective, then the ability to search a B-tree by prefix may not do you much good. Don't get me wrong though: I fully agree with all of the given reasons for B-tree superiority to hash tables.

Although the newest storage technologies (particularly very fast byte-addressable NVMs) -- too new for TFA -- might well change things. When you have fast byte-addressable storage it will pay to not read pages, and hash tables may yet involve fewer accesses than B-trees in that case, though I suspect B-trees will successfully adapt anyways.

makmanalp|8 years ago

One thing that doesn't make the two interchangeable is total ordering and range queries, which are only supported by B-trees and not by hash indexes. So the question is whether you're willing to trade that away for faster insertion and lookup.

> When you have fast byte-addressable storage it will pay to not read pages

It always pays to not read pages :-) An alternative way to look at it is that faster storage is more forgiving of more accesses, meaning the benefits of hash indexes might be less prominent.

hyc_symas|8 years ago

A lot of folks talk about byte addressability as if it's going to be some massive advantage. It's not, simply for the reason that it's inefficient to store byte-level addresses for everything.

(And of course, you're really only getting cache-line atomicity at the hardware layer. Ignoring this fact, and just using arbitrary byte boundaries for accesses, will kill a lot of perf gain of NVMs.)

hyc_symas|8 years ago

Well, it's an interesting survey. I'm surprised at how much space it devoted to illustrating B-trees in the context of RDBMSs as a lot of the concepts there really had no specific connection to B-trees.

So much space devoted to logging, locking and latches. All heavily used in the industry up till the last decade. All obsolete now. Not a single mention of immutable/persistent B-trees. Were there really no academic papers on the topic written in a time covered by this survey?

mistercow|8 years ago

This is from 2011, in case you missed that. I was confused by sentences like:

>Flash memory, flash devices, and other solid state storage technology are about to change the memory hierarchy in computer systems in general and in data management in particular.

until I looked up at the publication date.

sctb|8 years ago

Thanks, we've updated the title!

bootcat|8 years ago

Really good read ! Must read for people who like algorithms and competitive programming !