> Probably the strongest arguments for B-trees over hash indexes pertain to multi-field indexes and to nonuniform distributions of key values. A hash index on multiple fields requires search keys for all those fields such that a hash value can be calculated. A B-tree index, on the other hand, can efficiently support exact-match queries for a prefix of the index key, i.e., any number of leading index fields
Yes, well, if none of the columns/fields in the index are sufficiently selective, then the ability to search a B-tree by prefix may not do you much good. Don't get me wrong though: I fully agree with all of the given reasons for B-tree superiority to hash tables.
Although the newest storage technologies (particularly very fast byte-addressable NVMs) -- too new for TFA -- might well change things. When you have fast byte-addressable storage it will pay to not read pages, and hash tables may yet involve fewer accesses than B-trees in that case, though I suspect B-trees will successfully adapt anyways.
One thing that doesn't make the two interchangeable is total ordering and range queries, which are only supported by B-trees and not by hash indexes. So the question is whether you're willing to trade that away for faster insertion and lookup.
> When you have fast byte-addressable storage it will pay to not read pages
It always pays to not read pages :-) An alternative way to look at it is that faster storage is more forgiving of more accesses, meaning the benefits of hash indexes might be less prominent.
A lot of folks talk about byte addressability as if it's going to be some massive advantage. It's not, simply for the reason that it's inefficient to store byte-level addresses for everything.
(And of course, you're really only getting cache-line atomicity at the hardware layer. Ignoring this fact, and just using arbitrary byte boundaries for accesses, will kill a lot of perf gain of NVMs.)
Well, it's an interesting survey. I'm surprised at how much space it devoted to illustrating B-trees in the context of RDBMSs as a lot of the concepts there really had no specific connection to B-trees.
So much space devoted to logging, locking and latches. All heavily used in the industry up till the last decade. All obsolete now. Not a single mention of immutable/persistent B-trees. Were there really no academic papers on the topic written in a time covered by this survey?
This is from 2011, in case you missed that. I was confused by sentences like:
>Flash memory, flash devices, and other solid state storage technology are about to change the memory hierarchy in computer systems in general and in data management in particular.
makmanalp|8 years ago
https://sigmod.org/sigmod-awards/people/goetz-graefe-2017-si...
He also has a lot to say about locking in B-trees:
http://15721.courses.cs.cmu.edu/spring2016/papers/a16-graefe...
cryptonector|8 years ago
Yes, well, if none of the columns/fields in the index are sufficiently selective, then the ability to search a B-tree by prefix may not do you much good. Don't get me wrong though: I fully agree with all of the given reasons for B-tree superiority to hash tables.
Although the newest storage technologies (particularly very fast byte-addressable NVMs) -- too new for TFA -- might well change things. When you have fast byte-addressable storage it will pay to not read pages, and hash tables may yet involve fewer accesses than B-trees in that case, though I suspect B-trees will successfully adapt anyways.
kristianp|8 years ago
Persistent B+-Trees in Non-Volatile Main Memory http://www.vldb.org/pvldb/vol8/p786-chen.pdf
How to Build a Non-Volatile Memory Database Management System https://www.cs.cmu.edu/~jarulraj/papers/2017.nvm.sigmod.pdf
makmanalp|8 years ago
> When you have fast byte-addressable storage it will pay to not read pages
It always pays to not read pages :-) An alternative way to look at it is that faster storage is more forgiving of more accesses, meaning the benefits of hash indexes might be less prominent.
hyc_symas|8 years ago
(And of course, you're really only getting cache-line atomicity at the hardware layer. Ignoring this fact, and just using arbitrary byte boundaries for accesses, will kill a lot of perf gain of NVMs.)
hyc_symas|8 years ago
So much space devoted to logging, locking and latches. All heavily used in the industry up till the last decade. All obsolete now. Not a single mention of immutable/persistent B-trees. Were there really no academic papers on the topic written in a time covered by this survey?
mistercow|8 years ago
>Flash memory, flash devices, and other solid state storage technology are about to change the memory hierarchy in computer systems in general and in data management in particular.
until I looked up at the publication date.
sctb|8 years ago
bootcat|8 years ago