top | item 40787928

(no title)

ainar-g | 1 year ago

> While the compiled programs stayed the same, we no longer get a warning (even with -Wall), even though both compilers can easily work out statically (e.g. via constant folding) that a division by zero occurs [4].

Are there any reasons why that is so? Do compilers not reuse the information they gather during compilation for diagnostics? Or is it a deliberate decision?

discuss

st_goliath|1 year ago

In the second example, the constant is propagated across expression/statement boundaries. It is likely, that this happened on IR level, rather than on the AST level.

I'd imagine the generic case becomes a non-trivial problem if you don't want to produce fluke/useless diagnostic messages.

The compiler might already be several optimization passes in at this point, variables long since replaced by chained SSA registers, when it suddenly discovers that an IR instructions produces UD. This itself might end up being eliminated in a subsequent pass or entirely depend on a condition you can't statically determine. In the general case, by the point you definitely know, there might not be enough information left to reasonably map this back to a specific point in the input code, or produce useful output why the problem happens here.

shadowgovt|1 year ago

Correct. And to add to this answer slightly: there might not be enough information because to keep all the context around, the compiler might need exponentially more memory (even quadratically more memory might be too much; program sizes can really add up and that can matter) to keep enough state to give a coherent error across phases / passes.

Back in the day when RAM wasn't so cheap that you could find it in the bottom of a Rice Krispies box, I worked with a C++ codebase that required us to find a dedicated compilation machine because a standard developer loadout didn't have enough RAM to hold one of the compilation units in memory. Many of these tools (gcc in particular, given its pedigree) date back to an era where that kind of optimization mattered and choosing between more eloquent error messages or the maximum scope of program you could practically write was a real choice.

jcranmer|1 year ago

There is very strong bias in clang not to emit any diagnostics once you get to the middle-end optimizations, partially because the diagnostics are now based on whims of heuristics, and partially because the diagnostics now become hard to attribute to source code (as the IR often has a loose correlation to the original source code). Indeed, once you start getting inlining and partial specialization, even figuring out if it is worth emitting a diagnostic is painfully difficult.

dzaima|1 year ago

Take for example the compiler optimizing:

    void foo(bool cond) {
      int a = 0;
      if (cond) a = 10;
      if (cond) printf("%d\n", 10 / a);
    }

into:

    void foo(bool cond) {
      int a = 0;
      if (cond) {
        a = 10;
        if (cond) printf("%d\n", 10 / a);
      } else {
        if (cond) printf("%d\n", 10 / a);
      }
    }

and then screaming that (in its generated 'else' block) there's a very clear '10 / 0'. Now, of course, you'd hope that the compiler would also recognize that that's in a never-taken-here 'if' (and in most cases it will), but in various situations it might not be able to (perhaps most notably 'cond' instead being a function call that will give the same result in both places but the compiler doesn't know that).

Now, maybe there are some passes across which false-positives would not be introduced, but that's a rather small subset, and you'd have to reorder the passes such that you run all of them before any "destructive" ones, potentially resulting in having to duplicate passes to restore previous optimization behavior, at which point it's not really "reusing".

gpderetta|1 year ago

I believe clang does not use data gathered from optimization for normal compilation diagnostics to avoid them being dependent on compilation flags.

GCC does, but I guess this case just a case of missed warning, possibly to suppress false positive cases.

Arnt|1 year ago

The do reuse information. But you have no guarantee that the point at which information is used is run after the point at which something is discovered.

They do try to run things so everything's used. They also try to compile quickly. There is a conflict.