top | item 43309849

(no title)

tomn | 11 months ago

another solution is to just cast the result to an uint8_t; with this, clang 19.1.0 gives the same assembly:

https://gcc.godbolt.org/z/E5oTW5eKe

discuss

nicula|11 months ago

Like @wffurr mentioned, this is indeed discussed in a footnote. I just added another remark to the same footnote:

"It's also debatable whether or not Clang's 'optimization' results in better codegen in most cases that you care about. The same optimization pass can backfire pretty easily, because it can go the other way around too. For example, if you assigned the `std::count_if()` result to a local `uint8_t` value, but then returned that value as a `uint64_t` from the function, then Clang will assume that you wanted a `uint64_t` accumulator all along, and thus generates the poor vectorization, not the efficient one."

tomn|11 months ago

I'm not sure how "it can go the other way around too" -- in that case (assigning to a uint8_t local variable), it seems like that particular optimisation is just not being applied.

Interestingly, if the local variable is "volatile uint8_t", the optimisation is applied. Perhaps with an uint8_t local variable and size_t return value, an earlier optimisation removes the cast to uint8_t, because it only has an effect when undefined behaviour has been triggered? It would certainly be interesting to investigate further.

In general I agree that being more explicit is better if you really care about performance. It would be great if languages provided more ways to specify this kind of thing. I tried using __builtin_expect to trigger this optimisation too, but no dice.

Anyway, thanks for the interesting article.

gblargg|11 months ago

I was hoping you could just provide an iterator_traits with a uint8_t difference type, but this is tied to the iterator type rather than specified separately, so you'd need some kind of iterator wrapper to do this.

tomn|11 months ago

Yeah, I thought about that too, but if you want to process more than 255 values this might not be valid, depending on the implementation of count_if.

wffurr|11 months ago

Which is discussed in the post and doesn’t work in GCC.

tomn|11 months ago

Oh right, I didn't see it in a couple of passes (and searching for cast); for anyone else looking it's in the 3rd footnote. Thanks.