top | item 45171515

(no title)

enedil | 5 months ago

Let me play devil's advocate: for some reason, functions such as strcpy in glibc have multiple runtime implementations and are selected by the dynamic linker at load time.

discuss

order

mort96|5 months ago

And there's a performance cost to that. If there was only one implementation of strcpy and it was the version that happens to be picked on my particular computer, and that implementation was in a header so that it could be inlined by my compiler, my programs would execute faster. The downside would be that my compiled program would only work on CPUs with the relevant instructions.

You could also have only one implementation of strcpy and use no exotic instructions. That would also be faster for small inputs, for the same reasons.

Having multiple implementations of strcpy selected at runtime optimizes for a combination of binary portability between different CPUs and for performance on long input, at the cost of performance for short inputs. Maybe this makes sense for strcpy, but it doesn't make sense for all functions.

duped|5 months ago

> my programs would execute faster

You can't really state this with any degree of certainty when talking about whole-program optimization and function inlining. Even with LTO today you're talking 2-3% overall improvement in execution time, without getting into the tradeoffs.

crest|5 months ago

Afaik runtime linkers can't convert a function call into a single non-call instruction.

PhilipRoman|5 months ago

Linux kernel has an interesting optimization using the ALTERNATIVE macro, where you can directly specify one of two instructions and it will be patched at runtime depending on cpu flags. No function calls needed (although you can have a function call as one of the instructions). It's a bit more messy in userspace where you have to respect platform page flags, etc. but it should be possible.