When I read this I immediately recognized it as a PDF that I have in GoodReader (where I store all my technical PDFs). So I'm curious if it will start a good conversation about b-trees this time :)
To get the conversation started, a couple of years ago, I decided to write a b-tree data structure to refresh my skills a bit. I wrote it in C since I've been doing a lot of Java the last 10 years or so. I only got as far as an in-memory b-tree and had started working on getting it stored to disk before I abandoned the project (think I was working on it during Christmas holidays and the holiday ended).
I subsequently read that in-memory b-trees can actually outperform other tree data structures like rb-trees due to cache coherency. I never bothered to performance test my b-tree vs. an rb-tree I also wrote for fun.
But ya, B-Trees will (sometimes significantly) outperform BSTs for small keys because they will have fewer cache misses. Also interesting is that the idea that RB-Trees are faster than AVL trees for insert/delete may be outdated as well. I've seen benchmarks of AVL trees beating std::set in both libstdc++ and libc++ for both of those operations. Again this is probably due to less cache misses.
My B-Tree story is actually a B+Tree story. B-Trees (and B+Trees) can can actually be used for more than just ordered sets/maps, I've implemented a C++ sequence container[1] that allows O(log n) random access insertion and deletion. It is _much_ faster than a similar container[2] implemented using AVL trees, but at the cost of iterator stability.
I did my final year project of my degree on B-Trees (Cache Conscious Optimisation of STL Data Structures).
The short version is that you want to tune to page size, inserts/reads tend to be faster than RB-Trees but deletes are slower. This was true across a variety of chips and cpu architectures.
It exposes the standard STL container interface. Although it was primarily intended to be used as a map or a set, if I remember correctly they also had b-tree backed large vectors that outperformed std:: vectors in random reads. I seem to have lost the url for that post/comment.
If you are interested in an overview of B-Trees, similar data structures, and how well they work on modern hardware, you may also find the survey bit of my master thesis interesting:
A lucid, enjoyable paper that allowed me to write a very fast B-Tree implementation in nascent ANSI C back in the day. Comer's books always struck just the right balance between scholarly completeness and clarity, at least to me.
craigching|10 years ago
https://news.ycombinator.com/item?id=4317545
https://news.ycombinator.com/item?id=7243568
When I read this I immediately recognized it as a PDF that I have in GoodReader (where I store all my technical PDFs). So I'm curious if it will start a good conversation about b-trees this time :)
To get the conversation started, a couple of years ago, I decided to write a b-tree data structure to refresh my skills a bit. I wrote it in C since I've been doing a lot of Java the last 10 years or so. I only got as far as an in-memory b-tree and had started working on getting it stored to disk before I abandoned the project (think I was working on it during Christmas holidays and the holiday ended).
I subsequently read that in-memory b-trees can actually outperform other tree data structures like rb-trees due to cache coherency. I never bothered to performance test my b-tree vs. an rb-tree I also wrote for fun.
Anyway, that's my story about b-trees!
detrino|10 years ago
But ya, B-Trees will (sometimes significantly) outperform BSTs for small keys because they will have fewer cache misses. Also interesting is that the idea that RB-Trees are faster than AVL trees for insert/delete may be outdated as well. I've seen benchmarks of AVL trees beating std::set in both libstdc++ and libc++ for both of those operations. Again this is probably due to less cache misses.
My B-Tree story is actually a B+Tree story. B-Trees (and B+Trees) can can actually be used for more than just ordered sets/maps, I've implemented a C++ sequence container[1] that allows O(log n) random access insertion and deletion. It is _much_ faster than a similar container[2] implemented using AVL trees, but at the cost of iterator stability.
[1] https://github.com/det/segmented_tree_seq
[2] http://avl-array.sourceforge.net
bbrazil|10 years ago
The short version is that you want to tune to page size, inserts/reads tend to be faster than RB-Trees but deletes are slower. This was true across a variety of chips and cpu architectures.
srean|10 years ago
It exposes the standard STL container interface. Although it was primarily intended to be used as a map or a set, if I remember correctly they also had b-tree backed large vectors that outperformed std:: vectors in random reads. I seem to have lost the url for that post/comment.
flgr|10 years ago
See here: https://www.researchgate.net/profile/Florian_Gross/publicati....
Along those lines:
* CSS Trees: Pointerless b-Trees with a layout optimized for cache lines (http://www.vldb.org/conf/1999/P7.pdf)
* Intel & Oracle's fast architecture-sensitive tree search (combines huge pages, cache line blocking, and SIMD in an optimal layout): http://www.researchgate.net/profile/Jatin_Chhugani/publicati...
* Adaptive radix trees (http://codematch.muehe.org/~leis/papers/ART.pdf)
tomcam|10 years ago
rizzin|10 years ago
detrino|10 years ago
unknown|10 years ago
[deleted]
tomcam|10 years ago
[deleted]