top | item 42881095

(no title)

CFLAddLoader | 1 year ago

The expected outcome of using a LLM to decompile is a binary that is so wildly different from the original that they cannot even be compared.

If you only make mistakes very rarely and in places that don't cause cascading analysis mistakes, you can recover. But if you keep making mistakes all over the place and vastly misjudge the structure of the program over and over, the entire output is garbage.

discuss

order

Vt71fcAqt7|1 year ago

That makes sense. So it can work for small functions but not an entire codebase which is the goal. Does that sound correct? If so, is it useful for small functions (like, let's say I identify some sections of code I think are important becuase they modify some memory location) or is this not useful?

CFLAddLoader|1 year ago

There are lots of parts of analysis that really matter for readability but aren't used as inputs to other analysis phases and thus mistakes are okay.

Things like function and variable names. Letting an LLM pick them would be perfectly fine, as long as you make sure the names are valid and not duplicates before outputting the final code.

Or if there are several ways to display some really weird control flow structures, letting an LLM pick which to do would be fine.

Same for deciding what code goes in which files and what the filenames should be.

Letting the LLM comment the code as it comes out would work too, as if the comments are misleading you can just ignore or remove them.