In a realistic application (with cache pressure) you may find an array performs even better because the CPU can trivially predict and pipeline the entire operation vs jumping around to random locations in memory.
Another approach is a B-tree with lots of nodes stored at the leaves. This can get you the array's linear performance for small datasets, while preserving other big-O benefits of sets/trees (eg insertion time).
As the side of your dataset grows, there is a performance cliff at L1, L2, and L3 cache size. For very large datasets, creating data structures with predictable access patterns often trumps Big-O entirely. If following a linked list thrashes the cache it turns out the constant factor can and does dominate.
So again, always measure with your application on your intended hardware. Be sure you measure with realistic loads and access patterns.
These results don't make any sense and I don't believe them.
The author is generating random strings of length 64. They then generate N random string and search for them. The probability of finding a match is effectively zero.
> Interestingly, up to about 75 elements, the vector actually outperforms both sets.
No way. The unordered_set will perform one hash, one lookup, and return false. That is not going to be slower than SEVENTY FIVE string comparisons.
I don't know how Google Benchmark works. So I wrote my own simple test. I could not find any combination of string length and num searches where a vector was faster than an unordered_set.
Not it won't for this small strings. A 64 byte string likely fits in a cache line so you're doing a cache line comparison which may even have a microop in the CPU.
Alternatively using SSE or AVX to vector compare the strings which is still really fast.
I suspect the author made critical mistakes too.
If you are actually hitting a match or do many operations with warmed cache, vector may actually win big time even compared to unordered_set as there may be no pointer chasing involved. (since both all keys and all values are in cache)
The vector scan essentially boils down to a memcmp for a sufficiently smart compiler.
Computing the hash may or may not be faster than this.
Kinda makes you wonder why those decisions (to provide bucket iteration, bucket iterators) were made. It's hard to picture a use for such a feature that makes it worthy of adding constraints to a standard container. I would love it if someone could provide some insight into the decision process that led to this state of affairs.
- Compiler, c++ library, compilation options, and platform are missing
- The author doesn't benchmark vector and unordered_map insertions with reserve()
- If I read correctly, the author benchmarks up to 32,768 items. This is way too low. There should be tests with millions of entries.
- Random string/numbers isn't a realistic scenario and very favourable to unordered_map. I would use data from real life examples, e.g. list of cities, list of people, etc.
- Ordered arrays are missing from the benchmark (e.g. boost::container::flat_set), it's a very powerful data structure. Why just do a sorted vector at the end?
- Vector outperforms unordered_set probably because of the hashing function used.
It is very rare that I have seen a use of std::set that seemed justified, although it does happen. People just use it because it's easier to type than unordered_set!
One way to speed up set's usually appalling performance is to give it a custom allocator so that instead of N calls to allocate tiny tree nodes all over the heap, you get all your tree nodes on an arena. This can dramatically speed up short-lived sets (for example you are trying to build the unique set of some non-unique input).
> In one setup, I used std::strings of length 64 and in the other I used 64 bit numbers. The main motivation behind measuring both numbers and strings was that comparing 64 bit numbers takes a constant number of clock cycles
For longish strings (length 64 qualifies), binary_search is missing an important optimization: it ends up walking the same prefix over and over.
A better algorithm is called "radix quicksort with a three-way partition." Rather than the classical qsort < and >= partition, you maintain three regions of <, =, and >. Crucially, at each step, you sort based only on a single character: the whole range via the first char, then the = region via the second char, then that = region via the third char, etc. This allows for super-fast comparisons without re-walking prefixes.
All of these data structures (std::binary_search, std::set, and and set::unordered_set) are quite sub-optimal. The design of unordered_set with closed-hashing separately-allocated buckets is especially distressing.
For a much more efficient take, see LLVM's DenseSet which does NOT separately allocate buckets. This burdens the client with finding two values to represent empty and tombstone entries, but if perf is a concern that's much better than additional mallocs.
It seems like many commenters here (the ones who aren’t questioning the methodology) are drawing the lesson that you should use vector rather than set.
From these charts, isn’t unordered_set clearly the best all-rounder? Which is exactly as I’d expect. Am I missing something?
Sure, a simple vector might be a bit faster for 10 or 100 or 1000 elements, but after a certain point performance is awful.
definitely check out flat_set too! most of the time using a sorted vector as the backing store for these kinds of things ends up being a net win, especially for smaller sets, but the size it starts to become a hindrance may surprise you.
By writing my own implementation (to avoid boost dependency) I kinda understood why they aren't standard: iterators have different invalidation rules and some special sorted-vector-derived useful properties, the map/multimap keys can't be const as in other standard associative containers, etc.
Very interesting. This makes it sound like there should be a way to control `std::lower_bound` to switch to linear probing once it's narrowed the range down enough. (In the specific example, "enough" is about 125 elements.)
Likewise for a hash table with separate chaining, wouldn't one better-than `std::unordered_set` way to do it be to keep the separate chains as a `std::vector` that is kept sorted, and then use `std::lower_bound` (or the above better version of it) to probe the table? Those separate tables are O(1) in size, so lookup and insertion are still O(1) on average. Worst case goes to O(n log n), but that doesn't seem too bad.
Related: Is there a widely-used "frozen set" C++ data structure? I'm picturing something that is constructed from a sequence but that internally is a sorted `std::vector<T>` that provides a set-like API but that uses `std::lower_bound` to find data. It's easy enough to write, but it seems like it should be in the standard or at least Boost.
The use of unsorted vectors for sets was covered and benchmarked in a Scott Meyers [0] post from 2015. Facebook's folly library supports linear containers and maps for these small sets (~100 elements, as a rule of thumb, but measure), and I've rolled my own for specific applications.
The article should say which C++ compiler and standard library they're using. Each implementation is different, and these results may not be true for all of them.
It's actually not hard to beat the performance of standard set and map containers in specific scenarios. For example in one application I have a treap implementation that's much faster than the standard ordered map, but it is much less featureful (doesn't support deletion for example).
Comparing lookup time of unsorted vector and sorted set is unfair. The vector should be sorted and then we might compare plain binary search against BST performance.
It is not unfair, and there is a comparison with A sorted vector as well.
And the author correctly indicated the reason as of why the sorted vector is performing better than the unsorted vector.
The cache friendliness is key. And it shows that for small sizes of elements an array or vector would perform better. It simply depends on your needs.
I would be interested in comparing an stack based array with integers to the sets as well.
[+] [-] xenadu02|7 years ago|reply
In a realistic application (with cache pressure) you may find an array performs even better because the CPU can trivially predict and pipeline the entire operation vs jumping around to random locations in memory.
Another approach is a B-tree with lots of nodes stored at the leaves. This can get you the array's linear performance for small datasets, while preserving other big-O benefits of sets/trees (eg insertion time).
As the side of your dataset grows, there is a performance cliff at L1, L2, and L3 cache size. For very large datasets, creating data structures with predictable access patterns often trumps Big-O entirely. If following a linked list thrashes the cache it turns out the constant factor can and does dominate.
So again, always measure with your application on your intended hardware. Be sure you measure with realistic loads and access patterns.
[+] [-] AstralStorm|7 years ago|reply
Std::Array is expected to have almost identical performance characteristics.
[+] [-] curyous|7 years ago|reply
[+] [-] forrestthewoods|7 years ago|reply
The author is generating random strings of length 64. They then generate N random string and search for them. The probability of finding a match is effectively zero.
> Interestingly, up to about 75 elements, the vector actually outperforms both sets.
No way. The unordered_set will perform one hash, one lookup, and return false. That is not going to be slower than SEVENTY FIVE string comparisons.
Looking at the code I _think_ that randString() for each of the N search strings is included in the timing code. That, uh, is not good? Line 29: https://github.com/akshaynanavati/benchmarks/blob/master/ben...
I don't know how Google Benchmark works. So I wrote my own simple test. I could not find any combination of string length and num searches where a vector was faster than an unordered_set.
https://pastebin.com/raw/mfBmGa5B
Example results:
String length: 8 Num Elements: 75 Num Searches: 10000 Gen Time: 1.94252 Vec Time: 4.12721 Set Time: 0.144987 Num Found: 0
[+] [-] AstralStorm|7 years ago|reply
I suspect the author made critical mistakes too.
If you are actually hitting a match or do many operations with warmed cache, vector may actually win big time even compared to unordered_set as there may be no pointer chasing involved. (since both all keys and all values are in cache) The vector scan essentially boils down to a memcmp for a sufficiently smart compiler. Computing the hash may or may not be faster than this.
[+] [-] chillee|7 years ago|reply
Using one of the policy-based data structures instantly gets you about an order magnitude of speedup (for accesses).
[+] [-] asdfasgasdgasdg|7 years ago|reply
[+] [-] fulafel|7 years ago|reply
[+] [-] shin_lao|7 years ago|reply
- Compiler, c++ library, compilation options, and platform are missing
- The author doesn't benchmark vector and unordered_map insertions with reserve()
- If I read correctly, the author benchmarks up to 32,768 items. This is way too low. There should be tests with millions of entries.
- Random string/numbers isn't a realistic scenario and very favourable to unordered_map. I would use data from real life examples, e.g. list of cities, list of people, etc.
- Ordered arrays are missing from the benchmark (e.g. boost::container::flat_set), it's a very powerful data structure. Why just do a sorted vector at the end?
- Vector outperforms unordered_set probably because of the hashing function used.
[+] [-] Jyaif|7 years ago|reply
You have to draw the line somewhere, and this post is very interesting as it is.
[+] [-] IshKebab|7 years ago|reply
[+] [-] ebikelaw|7 years ago|reply
One way to speed up set's usually appalling performance is to give it a custom allocator so that instead of N calls to allocate tiny tree nodes all over the heap, you get all your tree nodes on an arena. This can dramatically speed up short-lived sets (for example you are trying to build the unique set of some non-unique input).
[+] [-] dagenix|7 years ago|reply
Pedantic: comparing 64 character strings is O(1).
[+] [-] mattkrause|7 years ago|reply
[+] [-] AstralStorm|7 years ago|reply
[+] [-] kryptiskt|7 years ago|reply
[+] [-] ridiculous_fish|7 years ago|reply
A better algorithm is called "radix quicksort with a three-way partition." Rather than the classical qsort < and >= partition, you maintain three regions of <, =, and >. Crucially, at each step, you sort based only on a single character: the whole range via the first char, then the = region via the second char, then that = region via the third char, etc. This allows for super-fast comparisons without re-walking prefixes.
All of these data structures (std::binary_search, std::set, and and set::unordered_set) are quite sub-optimal. The design of unordered_set with closed-hashing separately-allocated buckets is especially distressing.
For a much more efficient take, see LLVM's DenseSet which does NOT separately allocate buckets. This burdens the client with finding two values to represent empty and tombstone entries, but if perf is a concern that's much better than additional mallocs.
[+] [-] iainmerrick|7 years ago|reply
From these charts, isn’t unordered_set clearly the best all-rounder? Which is exactly as I’d expect. Am I missing something?
Sure, a simple vector might be a bit faster for 10 or 100 or 1000 elements, but after a certain point performance is awful.
[+] [-] adzm|7 years ago|reply
[+] [-] de_watcher|7 years ago|reply
By writing my own implementation (to avoid boost dependency) I kinda understood why they aren't standard: iterators have different invalidation rules and some special sorted-vector-derived useful properties, the map/multimap keys can't be const as in other standard associative containers, etc.
[+] [-] BenFrantzDale|7 years ago|reply
Likewise for a hash table with separate chaining, wouldn't one better-than `std::unordered_set` way to do it be to keep the separate chains as a `std::vector` that is kept sorted, and then use `std::lower_bound` (or the above better version of it) to probe the table? Those separate tables are O(1) in size, so lookup and insertion are still O(1) on average. Worst case goes to O(n log n), but that doesn't seem too bad.
Related: Is there a widely-used "frozen set" C++ data structure? I'm picturing something that is constructed from a sequence but that internally is a sorted `std::vector<T>` that provides a set-like API but that uses `std::lower_bound` to find data. It's easy enough to write, but it seems like it should be in the standard or at least Boost.
[+] [-] InfamousRece|7 years ago|reply
[+] [-] yoklov|7 years ago|reply
[+] [-] stochastic_monk|7 years ago|reply
[0] https://scottmeyers.blogspot.com/2015/09/should-you-be-using...
[+] [-] jlarocco|7 years ago|reply
[+] [-] Jyaif|7 years ago|reply
[+] [-] halayli|7 years ago|reply
C++ sets define the interfaces and not the implementation.
[+] [-] kccqzy|7 years ago|reply
[+] [-] shaklee3|7 years ago|reply
https://github.com/sparsehash/sparsehash
I read a while ago that it beat all the competitors at almost every size.
[+] [-] petermcneeley|7 years ago|reply
[+] [-] mackal|7 years ago|reply
[+] [-] firethief|7 years ago|reply
[+] [-] wmu|7 years ago|reply
[+] [-] recentdarkness|7 years ago|reply
I would be interested in comparing an stack based array with integers to the sets as well.
[+] [-] iainmerrick|7 years ago|reply