(no title)
nneonneo | 2 months ago
As an example, consider this code (godbolt: https://godbolt.org/z/TrMrYTKG9):
struct foo {
unsigned char a, b;
};
foo make(int x) {
foo result;
if (x) {
result.a = 13;
} else {
result.b = 37;
}
return result;
}
At high enough optimization levels, the function compiles to “mov eax, 9485; ret”, which sets both a=13 and b=37 without testing the condition at all - as if both branches of the test were executed. This is perfectly reasonable because the lack of initialization means the values could already have been set that way (even if unlikely), so the compiler just goes ahead and sets them that way. It’s faster!
titzer|2 months ago
"But it's right there in the name!" Undefined behavior literally places no restrictions on the code generated or the behavior of the program. And the compiler is under no obligation to help you debug your (admittedly buggy) program. It can literally delete your program and replace it with something else that it likes.
[1] https://kristerw.blogspot.com/2017/09/why-undefined-behavior...
jmgao|2 months ago
The compiler sees that foo can only be assigned in one place (that isn't called locally, but could called from other object files linked into the program) and its address never escapes. Since dereferencing a null pointer is UB, it can legally assume that `*foo` is always 42 and optimizes out the variable entirely.
publicdebates|2 months ago
Compilers can do whatever they want when they see UB, and accessing an unassigned and unassiganble (file-local) variable is UB, therefore the compiler can just decide that *foo is in fact always 42, or never 42, or sometimes 42, and all would be just as valid options for the compiler.
(I know I'm just restating the parent comment, but I had to think it through several times before understanding it myself, even after reading that.)
recursivecaveat|2 months ago
masklinn|2 months ago
userbinator|2 months ago
pornel|2 months ago
This is because the code is executed symbolically during optimization. It's not running on your real CPU. It's first "run" on a simulation of an abstract machine from the C spec, which doesn't have registers or even real stack to hold an actual garbage value, but it does have magic memory where bits can be set to 0, 1, or this-can-never-ever-happen.
Optimization passes ask questions like "is x unused? (so I can skip saving its register)" or "is x always equal to y? (so I can stop storing it separately)" or "is this condition using x always true? (so that I can remove the else branch)". When using the value is an undefined behavior, there's no requirement for these answers to be consistent or even correct, so the optimizer rolls with whatever seems cheapest/easiest.
like_any_other|2 months ago
sethev|2 months ago
Good example of why uninitialized variables are not intuitive.
masklinn|2 months ago
quietbritishjim|2 months ago
It can just leave the result totally uninitialised. That's because both code paths have undefined behaviour: whichever of result.x or result.y is not set is still copied at "return result" which is undefined behaviour, so the overall function has undefined behaviour either way.
It could even just replace the function body with abort(), or omit the implementation entirely (even the ret instruction, allowing execution to just fall through to whatever memory happens to follow). Whether any computer does that in practice is another matter.
masklinn|2 months ago
That is incorrect, per the resolution of DR222 (partially initialized structures) at WG14:
> This DR asks the question of whether or not struct assignment is well defined when the source of the assignment is a struct, some of whose members have not been given a value. There was consensus that this should be well defined because of common usage, including the standard-specified structure struct tm.
As long as the caller doesn't read an uninitialised member, it's completely fine.
unknown|2 months ago
[deleted]
arrowsmith|2 months ago
Negitivefrags|2 months ago
The code says that if x is true then a=13 and if it is false than b=37.
This is the case. Its just that a=13 even if x is false. A thing that the code had nothing to say about, and so the compiler is free to do.
throwatdem12311|2 months ago
Same for b. If x is true, b could be 37 no matter how unlikely that is.
xboxnolifes|2 months ago
tehjoker|2 months ago