Facebook released their C++ containers last year as well, which they named Folly. An interesting feature they added is that their vector class is "aware" of the memory allocator. Ie, if jemalloc is available, then the vector class will attempt to resize all dynamic memory structures to fit within cache hierarchy sizes.
Reading C++ code always gives me a weird feeling something is inherently wrong with the language. Header files containing nearly all the implementation because that's the only place where you can define templates is so odd... (https://code.google.com/p/cpp-btree/source/browse/btree.h)
// Inside a btree method, if we just call swap(), it will choose the
// btree::swap method, which we don't want. And we can't say ::swap
// because then MSVC won't pickup any std::swap() implementations. We
// can't just use std::swap() directly because then we don't get the
// specialization for types outside the std namespace. So the solution
// is to have a special swap helper function whose name doesn't
// collide with other swap functions defined by the btree classes.
Also the use of extern keyword for templates which is valid according to the standard is limited by compiler compliance. Kind of a chicken-egg problem, not a language problem.
I have always wondered whether a grand rethinking of compilers/linkers wasn't in order given the radically different face of computing and computational resources available now as opposed to 30+ years ago.
Yes, it is a flawed language to have a declaration and the definition are all in one place together. What other language does this aside from Ruby, Python, Perl, JavaScript, Java, C#, Smalltalk, VisualBasic, PHP, Haskell, Lua...
And this is why you need to know all the algo and data structure "basics" when interviewing at the big G. Because people there are working on these sorts of optimizations in their spare time.
Agreed. When I was working on Google App Engine, the Quota team sat next to me. The Quota team are the service that just about every other service at Google calls to ensure a user isn't abusing "The Google". That means that they need to have insanely small response times to ensure that all the services run on time.
They hit an issue no-one really thinks about: resizing hash tables. No-one thinks about it, as it happens transparently under the surface, but for them, the few milliseconds it took to resize the hash tables would have enormously detrimental effects on just about every other Google service. The solutions they were talking about were really quite amazing. That's the insanity they face.
So, if anyone hasn't seen Google Sparsehash[1], check it out: it's another near drop-in improvement to the STL. Two bits per entry of overhead and insanely fast. I can't remember if the Quota team used it specifically, but I wouldn't be surprised if it's one of the results of their work.
What does knowing this stuff at interview-time have to do with being able to implement it later? I certainly don't remember how b-trees work off the top of my head, but what I do know is how to find out the answer given a computer and an internet connection.
Sorted arrays are a common choice that I've seen people go to in lieu of std::set, but a proper B-tree implementation kicks butt compared to a sorted array for performance when mutating data, and actually an unsorted array tends to kick but for small array sizes (<100 elements).
The STL guarantees both iterator and address stability for the elements of a set, map, or the unordered variants. Those guarantees can't be provided by many more compact representations.
Note that this rules out more than just B-trees: open addressed hash tables and judy array like structures have the same issue.
It says in the article that B-trees are advantageous when the cost of cache-misses would be worse than the cost of comparisons; e.g., in the case of integers. In comparison, a Red-Black tree would do less comparisons, and hash-table is faster on average but is unordered (more differences?).
Without considering cache-miss, Red-black tree is much faster than B-tree. So this b-tree implementation must be tailored to cache-line size, and it may performs badly on some CPUs.
The performance gain on the B-tree container is also largely depends on the size of value. So choose this one carefully.
I don't really think it is possible to say one is better than another without a specific application in mind. Probably just chose whatever one they wanted at the time.
I'm confuse. From my understanding of B-Tree, B-Tree are good when you load data from hard drive. It allow to minimize the number of time you load a range of data from the hard drive.
Since C++B-Tree is in RAM, and thus the data loading should be less important, how does it appears to have a so good benchmark? What is the secret ingredient?
[+] [-] chrisaycock|13 years ago|reply
http://news.ycombinator.com/item?id=4059356
[+] [-] arianvanp|13 years ago|reply
[+] [-] viraptor|13 years ago|reply
[+] [-] shawn-butler|13 years ago|reply
Also the use of extern keyword for templates which is valid according to the standard is limited by compiler compliance. Kind of a chicken-egg problem, not a language problem.
Finally you might investigate the extern template keyword in c++11: http://en.wikipedia.org/wiki/C++11#Extern_template
I have always wondered whether a grand rethinking of compilers/linkers wasn't in order given the radically different face of computing and computational resources available now as opposed to 30+ years ago.
[+] [-] blablabla123|13 years ago|reply
[+] [-] cbsmith|13 years ago|reply
[+] [-] halayli|13 years ago|reply
[+] [-] guilloche|13 years ago|reply
[+] [-] leetrout|13 years ago|reply
[+] [-] Smerity|13 years ago|reply
They hit an issue no-one really thinks about: resizing hash tables. No-one thinks about it, as it happens transparently under the surface, but for them, the few milliseconds it took to resize the hash tables would have enormously detrimental effects on just about every other Google service. The solutions they were talking about were really quite amazing. That's the insanity they face.
So, if anyone hasn't seen Google Sparsehash[1], check it out: it's another near drop-in improvement to the STL. Two bits per entry of overhead and insanely fast. I can't remember if the Quota team used it specifically, but I wouldn't be surprised if it's one of the results of their work.
[1]: http://code.google.com/p/sparsehash/
[+] [-] eridius|13 years ago|reply
[+] [-] adamnemecek|13 years ago|reply
[+] [-] unknown|13 years ago|reply
[deleted]
[+] [-] shin_lao|13 years ago|reply
http://www.boost.org/doc/libs/1_52_0/doc/html/container/non_...
[+] [-] cbsmith|13 years ago|reply
[+] [-] damian2000|13 years ago|reply
[+] [-] gsg|13 years ago|reply
Note that this rules out more than just B-trees: open addressed hash tables and judy array like structures have the same issue.
[+] [-] andreasvc|13 years ago|reply
[+] [-] guilloche|13 years ago|reply
The performance gain on the B-tree container is also largely depends on the size of value. So choose this one carefully.
[+] [-] SenorWilson|13 years ago|reply
[+] [-] unknown|13 years ago|reply
[deleted]
[+] [-] unknown|13 years ago|reply
[deleted]
[+] [-] olivier1664|13 years ago|reply
[+] [-] alexgartrell|13 years ago|reply
[+] [-] grundprinzip|13 years ago|reply
http://people.cs.aau.dk/~simas/dat5_07/papers/p475-rao.pdf
Or are the optimisations mentioned there still valid?