...This is news? You've got a tight loop with a branch, and depending on which way the branch goes, you're going to touch completely different locations in memory.
The solution (at least for some problems) is just to lay your data out in memory better - optimal is a binary search tree in an array, like the way heaps are implemented.
I have some really ugly things to say to a new poster who cannot be bothered to read an article before trying to show off, but I'll refrain. In the future, read first, think, then comment. You'll be doing everyone else a favor.
Using the heap layout, you can sometimes even eliminate branch misspredictions and subsequent (expensive) pipeline flushes.
The basic idea is to allow your compiler to use predicated instructions. Do do this, you have to transform control dependencies (e.g., `if (a > b)`) to data dependencies on index computations (e.g., `j = 2*j + (a > b)`).
> Optimal is a binary search tree in an array, like the way heaps are implemented.
Not quite. The heap layout has bad spatial locality. Think about the last row of leaves - it occupies the last 50% of the array. The next-to-last-row (all their parents) is in the middle 25%-50%. All those leaves are far away from their parents, and thus the heap is not optimal.
Better is to cluster parents and children into small groups. With clusters of three, two thirds of the time a child will adjacent to its parent.
[+] [-] brooksbp|13 years ago|reply
A Comparison of Cache Aware and Cache Oblivious Static Search Trees Using Program Instrumentation
http://www.cs.amherst.edu/~ccm/cs34/papers/ladnerbst.pdf
[+] [-] koverstreet|13 years ago|reply
The solution (at least for some problems) is just to lay your data out in memory better - optimal is a binary search tree in an array, like the way heaps are implemented.
That's what I did for bcache: http://evilpiepirate.org/git/linux-bcache.git/tree/drivers/m...
[+] [-] dmpk2k|13 years ago|reply
I have some really ugly things to say to a new poster who cannot be bothered to read an article before trying to show off, but I'll refrain. In the future, read first, think, then comment. You'll be doing everyone else a favor.
[+] [-] steerb|13 years ago|reply
The basic idea is to allow your compiler to use predicated instructions. Do do this, you have to transform control dependencies (e.g., `if (a > b)`) to data dependencies on index computations (e.g., `j = 2*j + (a > b)`).
For a complete explanation, see http://algo2.iti.kit.edu/sanders/papers/ssss.ps.gz
[+] [-] keenerd|13 years ago|reply
Not quite. The heap layout has bad spatial locality. Think about the last row of leaves - it occupies the last 50% of the array. The next-to-last-row (all their parents) is in the middle 25%-50%. All those leaves are far away from their parents, and thus the heap is not optimal.
Better is to cluster parents and children into small groups. With clusters of three, two thirds of the time a child will adjacent to its parent.
[+] [-] chiraagj|13 years ago|reply
[deleted]
[+] [-] losethos|13 years ago|reply
[deleted]