top | item 39194973

(no title)

obl | 2 years ago

Weird that this treats uninitialized variables as unknown values. For example in ex3.c, the program

  int main(){
      int x;
      if (x <= 42){
              assert(x != 12345);
      }
  }

is of course UB in C, even though under a "uninitialized is random" model the program is valid and does not assert (as the model checker concludes).

(even in O1 clang gets rid of the whole function, including even the ret instruction, I'm surprised it does not at least leave an ud2 for an empty function to help debugging since it would not cost anything https://godbolt.org/z/eK8cz3EPe )

discuss

nanolith|2 years ago

Well, specifically, it counts uninitialized variables as being set to a non-deterministic value. The point of this tool isn't to optimize functions as a compiler would, but rather to find bad behavior based on its machine model.

This isn't perfect, of course. In this case, the compiler can rightly treat x as uninitialized, meaning that its value could be <= 42 at one point, and not be <= 42 at another point. Since the uninitialized variable isn't "pinned", it could technically be different values in different locations.

CBMC's machine model works differently. In this case, x is assigned a non-deterministic value. The branch condition creates a refinement of this value within the scope of that statement. If the branch is taken, then by SMT rules, x can't equal 12345, because it was already refined as being <= 42.

On its own, a model checker can miss situations like these. It's why I recommend -Wall -Werror -Wpedantic in conjunction with CBMC. The compiler should catch this as a warning, and it should be upgraded as an error.

zzo38computer|2 years ago

> In this case, the compiler can rightly treat x as uninitialized, meaning that its value could be <= 42 at one point, and not be <= 42 at another point. Since the uninitialized variable isn't "pinned", it could technically be different values in different locations.

LLVM has a "freeze" command to stop propagation of undefined values (although I think that command was added later than the first version), so that the value is "pinned" as you say. However, the "undef" command, if you do not use "freeze", will not do this.

I think that the C compiler should "pin" such undefined values where they are used, but I don't know which compilers have an option to do this. (Perhaps CBMC should also have such a switch, so that you can use the same options that you will use with the C compiler.)

With this ex.3 file, the optimizer should be allowed to result in a function that does nothing and has an unspecified return value (which might or might not be the same each time it is executed). (If optimizations are disabled, then it should actually compile the conditional branch instruction.)

kazinator|2 years ago

Since any use of the variable is undefined behavior, that's what we want to be informed about.

The reasoning about nondeterministic values should be spared for situations when it's other than undefined behavior. For instance, accessing an uninitialized structure member that came from malloc isn't undefined behavior. It's (I think) unspecified behavior. In an implementation that has trap representations for some types, it could hit a trap.

magicalhippo|2 years ago

Recall trying to find some fun bugs caused by uninitialized memory, which was made harder to find due to the fact that the debug runtime zeroed memory allocations while the release runtime did not.

Brian_K_White|2 years ago

The declaration didn't reserve the address? What else is the point of having a declaration?

I don't think accepting the random initial value means that the value may continue to change until the first time you set a value.

fweimer|2 years ago

Uninitialized local variables are documented as a source of a certain type of nondeterminism: http://www.cprover.org/cprover-manual/modeling/nondeterminis...

So the checker treats them as defined, but with an unknown variable. You could have written this instead:

    extern int unknown_int_value(void);
    int x = unknown_int_value();

And leaving unknown_int_value undefined (so it's not visible to the analyzer). Or write a function and use x as a parameter.

I suspect CBMC does this to have a convenient syntax for this frequent scenario. Apparently, it's used quite often, as in these examples: https://model-checking.github.io/cbmc-training/cbmc/overview...

It seems that CBMC is not intended to check production sources directly against C semantics, but to prove things about programs written in a C-like syntax.

philzook|2 years ago

I see the calling an undefined prototype style more often, perhaps for this reason. You are probably right this is undefined behavior, but it's subtle. https://stackoverflow.com/questions/11962457/why-is-using-an... I suspect CBMC just picks some concrete behavior for undefined behavior. It may not be a good detector for that. I'm not sure. This gets into shaky territory of understanding for me.

jcranmer|2 years ago

Speaking as a compiler writer:

Any invocation of undefined behavior should be considered an assertion failure as far as any model checker should be concerned. Compilers can--and will--treat undefined behavior as license to alter the semantics of your program without any constraint, and most instances of undefined behavior are clearly programmer error (there is no good reason to read uninitialized memory, for example).

Reading an uninitialized variable is not "subtle" undefined behavior. It's one of the most readily accessible examples not only of what undefined behavior can exist, but also the ways compilers will mutilate your code just because you did it. To be honest, if something as simple as the consequences of reading uninitialized memory are shaky understanding for someone trying to prove code correct, that will completely undermine any trust I have in the validity of your proofs.

philzook|2 years ago

For undefined behavior detection, I have heard of these:

- UB sanitizer https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html

- Cerberus Semantics https://www.cl.cam.ac.uk/~pes20/cerberus/

- https://github.com/kframework/c-semantics

- https://github.com/TrustInSoft/tis-interpreter

philzook|2 years ago

https://github.com/diffblue/cbmc/issues/7732 I'll note that some form of undefined behavior checking / documentation is on the roadmap for the next major version