top | item 36618345

(no title)

414owen | 2 years ago

A clickbait title for an in-depth look at hand-optimizing a very simple loop.

discuss

ftxbro|2 years ago

I'm not a compiler expert but if it's a "very simple loop" is it still too complex for the compiler to make good machine code? Did they use a bad compiler on purpose? Or are computers just not yet fast enough to do a good job with very simple loops in practical compilers?

moonchild|2 years ago

> are computers just not yet fast enough to do a good job with very simple loops in practical compilers?

The short answer to this question is 'yes', but there are some extenuating factors:

- Although we could do interesting things with unlimited computational resources, the current crop of c compilers is simply not very good, compared with what's possible today.

- Performance is always workload-dependent; the compiler has been somewhat shafted here because it doesn't know what sorts of inputs the function usually receives. The compiler output is better than the 'improved' code for some inputs. (It's possible you could get a better result from the existing compilers and c code just by using profile-guided optimisation.)

- The difference is prone to be more pronounced in simple loops than large ones. This is a contrived use-case. There is not a factor of 6 of performance hiding in optimised c code which could be recovered by doing the sorts of optimisations done by the op. Probably something more like 10-20%.

cjensen|2 years ago

The problem is the author of the article is making some huge implicit assumptions that the compiler can't possibly know about.

Consider this statement: "However, we know some things about this loop. We know that the only time we break out of it is when we hit the null terminator (’\0’). The code clang generates checks for the null terminator first, but this makes no sense."

This statement contains huge assumptions about the lengths of the input strings and the frequency of the letters 's' and 'p' in the input. And then has the chutzpah to call the compiler's failure to read his mind about this as "making no sense."

Good first effort by the author, but has not sufficiently thought through the problem.

twoodfin|2 years ago

This is the right answer:

https://news.ycombinator.com/item?id=36622584

Optimal assembly (forgoing SIMD, at least) for this loop on modern x86 is highly dependent on the entropy of the runtime data.

zokier|2 years ago

I wonder what superoptimizers like stoke and souper would do with this code.

bjourne|2 years ago

Don't get discouraged by the comments and that others made faster variants. I liked both your articles very much and learned a few new things.