I wonder why they didn't just make two separate routines without any branch except for loop.
Then you would just make another safety offset to assume for the last loop iteration prediction.
That would _probably_ be safe, but you have to be sure that the xdcbt instructions in the "special" function are far enough into the function that speculative execution can never reach there. Pipelines are way deeper than most people realize so this might require a lot of instructions before the first xdcbt.
And then, for maximum performance you want prefetch instructions to be as early as possible. So, you immediately have a contradiction.
And, assuming that you resolve this there is still the risk that a mispredicted indirect branch could end up triggering an xdcbt. So, you end up with no guarantees anyway.
brucedawson|4 years ago
And then, for maximum performance you want prefetch instructions to be as early as possible. So, you immediately have a contradiction.
And, assuming that you resolve this there is still the risk that a mispredicted indirect branch could end up triggering an xdcbt. So, you end up with no guarantees anyway.
saagarjha|4 years ago