top | item 19426655

(no title)

> "Unknown" ranges from 49% to 76%.

Yeah, this is interesting. They're saying they can't determine whether a pointer targets an array buffer or not? Perhaps they might want to take a look at the (long neglected) "C to SaferCPlusPlus" translator[1] which can do this. (It was an unexpectedly taxing undertaking though.) It converts C arrays and allocated buffers used as arrays into memory safe implementations of std::array<>s and std::vector<>s, so failure to properly identify them would generally result in output code that wouldn't compile.

The examples they give of problematic code in the paper:

    void f(int* a) {
        *(int**)a = a;
    }

and

    f1(((int*) 0x8f8000));

don't strike me as the kind you would often encounter in real-world code.

> The syntax they use is rather clunky

The output code of the "C to SaferCPlusPlus" translator replaces the types and declarations with macros[2] that can be redefined with a compile-time directive to either use the safe C++ implementation, or revert to the original unsafe native C implementation. The argument being that using macros instead of custom syntax makes the source code more versatile. And existing C programmers already "get" macros.

[1] shameless plug: https://github.com/duneroadrunner/SaferCPlusPlus-AutoTransla...

[2] https://github.com/duneroadrunner/SaferCPlusPlus/blob/master...

discuss

Animats|7 years ago

Saw this in the translated "SaferCPlusPlus" output examples.

    static void string_set(char** out, const char* in)

What happened there? Where are the array types? Wrong place to look?

If inference can't make a definitely good decision, maybe translators should guess, conservatively. That is, if it looks like something needs an array type parameter, make it an array type parameter with subscript checking. Then run tests on the translated program and see if that works. That's what humans do on such code. Machine learning has potential here. For any array in a working program, there must be some expression of some variables that expresses the size of the array. If humans can't find that expression, the program is unmaintainable and probably has a bug.

There are really 3 cases.

1. this is a pointer, and it's never subscripted or offset. That's a pointer to a single instance of something.

2. this is a pointer which is subscripted or offset, and we can tell from context how big the array is.

3. This is a pointer which is subscripted or offset, but auto-translation fails to figure out how big the array is supposed to be.

The problem is to convert (3) into (2).

I tend to think that a good metric for C code quality is how hard that is. If it's not obvious by looking how big something is supposed to be, there's probably a potential bug.

[1] https://github.com/duneroadrunner/SaferCPlusPlus-AutoTransla...

duneroadrunner|7 years ago

Thanks for noticing :) It's been quite a while since I worked on the code, but I believe that the translator intentionally left types declared as "char {star}" unmodified assuming that they were being used as strings [1] rather "regular" array buffers. I'm guessing that dealing with strings would have been a lot more work because it would require providing safe compatible replacements for all the standard C library string functions.

I think you should find that array buffers of other types, like "unsigned char" or "const unsigned char", and their associated pointer iterators are translated to their corresponding macros. I'd be interested if you find otherwise. If you're interested, the relevant code for the translator is in the "safercpp" subdirectory [2]. It's not super-well commented so if you have any questions feel free to post them in the "issues" section of the repository.

[1] https://github.com/duneroadrunner/SaferCPlusPlus-AutoTransla...

[2] https://github.com/duneroadrunner/SaferCPlusPlus-AutoTransla...