So I've thought about it and I don't really feel like spending more time to convince you that this works. If you have questions I am happy to answer them, but please write your own code.
It's fine and thank you! I am playing arround with the idea, in theory all is good.. Only thing is that things like "first non ..." often involve branching that corrupts the prediction ability of the CPU. Therefore I kindly invited you to show it in code.
You can find the first set bit in an integer with a machine instruction, it's completely branch free. gcc has __builtin_ctz() for this. You'll either need to iterate over all set bits (so one branch per set bit) or use a compression instruction (requiring AVX-512) to turn the bit set into a set of integers.
That said, as you seem to actually want to do something with the results, you'll take a branch per match anyway, so I don't see the problem.
zokrezyl|2 months ago
clausecker|2 months ago
That said, as you seem to actually want to do something with the results, you'll take a branch per match anyway, so I don't see the problem.