(no title)
fuber2018 | 2 years ago
But the compiler/CPU can process bytes one at a time or much faster in groups. The code is trying to process as much as possible in groups of 128.
But since the caller can pass in a string which is not a mulitple of 128 chars, the first for-loop (& 127) will figure out how much of the string to process such that the remaining string length is a multiple of 128.
The second for-loop (>> 7) calculates divides by 128 (>> 7) to find out how many multiples of 128 there are to process. The inner for-loop processes 128 chars looking for 's' chars.
Now the for-loop within a for-loop doesn't look any faster than the plain single for-loop, but I'd assume that the heuristics of certain compilers can intuit that it can generate code to operate on multiple chars at the same time (SIMD instructions), since the result of one operation are independent of others.
On a compiler that cannot generate SIMD code, the code won't be much faster, if at all, than the naive straightforward manner.
No comments yet.