top | item 36624742

(no title)

414owen | 2 years ago

That's the thing, a C compiler has all the information it needs to know that the maximum amount of times a '\0' can be processed in the loop is once (because the function returns), but there's no upper bound on the amount of times other characters are seen in the loop.

I might be missing a reason that this information of opaque to the compiler though, in which case, this section of the article is indeed lacking, but I'm happy to learn :)

discuss

cjensen|2 years ago

It's not just that the C compiler lacks the information... but the reader of this article also lacks this information.

String length tells you the frequency with which nul terminators will be found. Without knowing frequency of occurrence of the nul terminator, 's', and 'p' then you cannot know which one occurs most often.

Consider two benchmark cases: (1) every string tested contains exactly one character (2) every string tested is 1MB long and is composed entirely of 's' and 'p'.

The author's first "optimization" assumes nul is rare. It would make benchmark (1) worse, and (2) better.

The article is a good example of "specification is hard, code is easy." He insufficiently specified the problem to be solved, and his test cases contained information not in the code and not in the text of the article.

414owen|2 years ago

I guess the question is whether the compiler should optimize a function containing a loop for a single null terminator, or for more data.

I would suggest the latter is what you want most of the time.

There's also the option of running a quick check for the null terminator before the loop, and then optimizing the loop for the other options.

But in any case, I think the demonstration of the technique of rearranging branches is interesting, and I needed a program to apply it to.

aengvs|2 years ago

It's not the upper bound that matters but the frequency. How frequently should the compiler assume an 's' appears in the dataset, or any other character?

We know that E[# of '\0' in a string] == 1.

But what is E[# of 's' in a string]? Is it greater or less than E[# of '\0' in a string], and how should the compiler know this?

You haven't given the compiler any reason to assume that 's' or 'p' will appear more often than '\0'.