top | item 41150080

(no title)

opnitro | 1 year ago

I think this is a point of view that seems sensible, but probably hasn't really thought through how this works. For example

  some_array[i]
What should the compiler emit here? Should it emit a bounds check? In the event the bounds check fails, what should it do? It is only through the practice of undefined behavior that the compiler can consistently generate code that avoids the bounds check. (We don't need it, because if `i` is out-of-bounds then it's undefined behavior and illegal).

If you think this is bad, then you're arguing against memory unsafe languages in general. A sane position is the one the Rust takes, which is by default, yes indeed you should always generate the bounds check (unless you can prove it always succeeds). But there will likely always be hot inner loops where we need to discharge the bounds checks statically. Ideally that would be done with some kind of formal reasoning support, but the industry is far that atm.

For a more in depth read: https://blog.regehr.org/archives/213

discuss

order

Dylan16807|1 year ago

> What should the compiler emit here?

It should emit an instruction to access memory location some_array + i.

That's all most people that complain about optimizations on undefined behavior want. Sometimes there are questions that are hard to answer, but in a situation like this, the answer is "Try it and hope it doesn't corrupt memory." The behavior that's not wanted is for the compiler to wildly change behavior on purpose when something is undefined. For example, the compiler could optimize

  if(foo) {
      misbehaving_code();
      return puppies;
  } else {
      delete_data();
  }
into

  delete_data();

opnitro|1 year ago

I think the "do the normal" thing is very easy to say and very hard to do in general. Should every case of `a / b` inject a `(b != 0) && ((a != INT_MAX && b != -1))`? If that evaluates to `true` then what should the program do? Or: should the compiler assume this can't happen. Languages with rich runtimes get around this by having an agreed upon way to signal errors, at the expense of runtime checking. An example directly stolen from the linked blog post:

  int stupid (int a) {
    return (a+1) > a;
  }
What should the compiler emit for this? Should it check for overflow, or should it emit the asm equivalent of `return 1`? If your answer is check for overflow: then should the compiler be forced to check for overflow every time it increments an integer in a for loop? If your answer is don't check: then how do you explain this function behaving completely weird in the overflow case? The point I'm trying to get at is that "do the obvious thing" is completely dependent on context.

cobbal|1 year ago

Ah, but what if it writes so far off the array that it messes with the contents of another variable on the stack that is currently cached in a register? Should the compiler reload that register because the out of bounds write might have updated it? Probably not, let's just assume they didn't mean to do that and use the in-register version. That's taking advantage of undefined behavior to optimize a program.

tialaramex|1 year ago

> That's all most people that complain about optimizations on undefined behavior want

If this was true most of them could just adopt Rust where of course this isn't a problem.

But in fact they're often vehemently against Rust. They like C and C++ where they can write total nonsense which has no meaning but it compiles and then they can blame the compiler for not reading their mind and doing whatever it is they thought it "obviously" should do.

uecker|1 year ago

Why not just turn off (or down) optimizations? I mean, optimization is not even activated by default

g15jv2dp|1 year ago

> It should emit an instruction to access memory location some_array + i.

That's definitely what compilers emit. The UB comes from the fact that the compiler cannot guarantee how the actual memory will respond to that. Will the OS kill you? Will your bare metal MCU silently return garbage? Will you corrupt your program state and jump into branches that should never be reached? Who knows. You're advocating for wild behavior but you don't even realize it.

As for your example. No, the compiler couldn't optimize like that. You seem to have some misconceptions about UB. If foo is false in your code, then the behavior is completely defined.