(no title)
wandermatt | 13 years ago
From his 2011 NEScala talk on functional data structures, covering practical considerations like locality of references and caching. Available on vimeo http://vimeo.com/20262239
wandermatt | 13 years ago
From his 2011 NEScala talk on functional data structures, covering practical considerations like locality of references and caching. Available on vimeo http://vimeo.com/20262239
fusiongyro|13 years ago
"Both Haskell's built-in list type and the DList type that we defined above have poor performance characteristics under some circumstances. The Data.Sequence module defines a Seq container type that gives good performance for a wider variety of operations." -- http://book.realworldhaskell.org/read/data-structures.html#d...
carterschonwald|13 years ago
I may wind up using this or a similar data structure in a number of current commercial and open source projects
dons|13 years ago
Individual implementations might suck. Others are just fine (http://hackage.haskell.org/packages/archive/fingertree/0.0/d...) - though even this could be optimized (for constant factors).
marssaxman|13 years ago
wandermatt|13 years ago
If you don't want to watch the video, a version of this talk was also given at Strange Loop, the slides for that one are available online: http://www.infoq.com/presentations/Functional-Data-Structure...
AaronFriel|13 years ago
See this overview of the difficulty of building efficient datastructures in Java: http://www.cs.virginia.edu/kim/publicity/pldi09tutorials/mem...
AaronFriel|13 years ago
Some takeaways from this slide deck: a TreeMap<Double, Double> has a minimum of 82% overhead. If you allocate 100 megabytes of memory to that tree map, only one fifth of that memory is your data, the rest is object overhead. An array, on the other hand, would have only 16 bytes of data, and thus would store close to five times as much data per megabyte. Even if you implement some strange accounting scheme on top of your array to make it act like a tree, you're probably already reaping performance benefits due to greater data locality, likelihood of having data in cache lines, etc.
I'm reminded of when I worked with libraries for Judy arrays in 2006 or so. Judy arrays are a radix tree like data structure with performance optimized for cache line sizes and dereferences at every node of the structure. Pointers to leaves are tagged in their lowest 3 bits to indicate which of 8 types of node they are. A very full node then will usually be allocated as an array, a node with fewer entries than roughly 3/4 of a cacheline worth of pointers will have an array of bytes, and an array of pointers to child nodes, located adjacently. The first list is searched, and if there's a hit, it uses that index to find the pointer to the child in the second list.
All of these optimizations are really fantastic, Judy arrays not only have more "features" than hash tables, such as very efficiently stored arbitrary length keys and the ability to access the first, last, and random elements of the tree very efficiently, but also a Judy array can be accessed in lexicographic order which cannot be done with a conventional hash table. On top of this, in practice, they outperform naive hash trees straight out, due to their eye to optimizing for real world processors' cache line size.
All of this is fantastic, except if you implemented this in Java it would be, ahem, "in practice it is very very very very very very very very very very very slow... very very slow".
wandermatt|13 years ago