top | item 40978346

(no title)

clord | 1 year ago

I used to do something like this all the time with C/C++ compiler tests. I tried lots of fancy tools and stuff, but I kept going back to: expand all macros and retokenize to one token per line (I made a custom build of the preprocessor that had this built in). Then, have a shell script randomly remove lines, and use another script to check that the resulting test case behaves consistently with the failure. It would run for a few hours (or days, for boost problems), then usually you'd get a really minimal testcase that shows the problem. Often I would use this to find regressions. Just have the shell script check one is good, the other has the problem. The resulting output would then usually point exactly at the regressed feature, and make an amazing unit test.

Before leaving compiler team, I wanted to find a way to do this one the AST level (so parens always go in pairs, etc), but that could also complect with the bug.

I wonder if LLMs could accelerate this by more intelligently removing stuff first, iteratively?

discuss

derdi|1 year ago

Sophisticated reducers like C-Reduce do know things like that parens go in pairs. C-Reduce has many transformation operations, and while some are syntax agnostic (delete a few characters or tokens), others use Clang to try to parse the input as C++ and transform the AST.

Perses is a reducer that works on an AST and claims to only try syntactically valid candidate inputs. It also claims to be language agnostic, and I don't know how these two things go together. But it does work nicely for Java for me. https://github.com/uw-pluverse/perses / https://faculty.cc.gatech.edu/~qzhang414/papers/icse18_cheng...

DRMacIver|1 year ago

Perses isn't language agnostic, it just knows the syntax of a lot of languages because there are antlr grammars for most commonly used languages.

Really there's no such thing as a language-agnostic test-case reducer. shrink ray is much closer than most, but all this means is that it's got some heuristics that work well for a wide variety of common languages (e.g. the bracket balancing thing). It's also got a bunch of language-specific passes.

This is sortof inherent to the problem, because in order to get good results and good performance, a test-case reducer has to have a strong idea of what sort of transformations are likely to work, which in turn means it has to have a strong idea of what sort of languages it's likely to be run on.

anitil|1 year ago

I'm trying to understand - would you convert

  #define RET 42
  int main(void){return RET;}

  int
  main
  (
  void
  )
  {
  return
  42
  ;
  }

and then randomly remove lines, recompile (if it is compile-able) and check the behaviour?