How about this for an idea: a decompiler that uses Machine Learning to name the decompiled variables and functions. Would be nice even if it worked only sometimes.
For an AI-less solution there is IDA's Lumina which works pretty well. There's also a reverse engineered server for it [0] so you could build plugins for other disasemblers/decompilers to use with non-official servers.
It basically hashes machine code (with address parts removed) [1], then when reverse engineers label and push symbols to the server (or get them from some debug build), others can pull and see what the functions are called in completely unrelated projects, that use the same libraries / have the same functions.
I'm surprised nobody has mentioned DIRE[0] yet. They did exactly this and got some very impressive results.
[0]: https://arxiv.org/abs/1909.09029 / J. Lacomis et al., "DIRE: A Neural Approach to Decompiled Identifier Naming," 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2019, pp. 628-639, doi: 10.1109/ASE.2019.00064.
It's certainly possible - Compile all the C projects on github with `gcc -O0`. Map statements, blocks, or functions to ASM output. Put everything in a giant SQL. Repeat for all of gcc's compiler flags.
Wait, did I say it was possible? I'm curious what a neural netted compiler would produce. Probably your average CRUD software.
As Rust is llvm based, you don't need to compile it to C. Just write a backend that translates LLVM IR to C instead of x86_64. The IR is very C looking, it's probably overly complex.
Compiling down to asm, lot of information is lost regarding memory layout etc, so it's not the best source for generating code.
Some architectural/program assumptions may not be encoded in assembly or preserved in the asm -> C -> asm roundtrip, especially if the assemblers are for different architectures. The obvious example is pointer word size and memory model.
It wouldn't look much like what you'd expect C to look like. If it did decompile back to idiomatic C it could introduce some kind of aliasing bug along the way that'd make it not so memory safe.
Critic for the author have picture of you decompile so people can feel it out and judge the quality. I wanted to check it out but I was on my phone so I can quite build and run the repo
[+] [-] kubb|3 years ago|reply
[+] [-] kuroguro|3 years ago|reply
It basically hashes machine code (with address parts removed) [1], then when reverse engineers label and push symbols to the server (or get them from some debug build), others can pull and see what the functions are called in completely unrelated projects, that use the same libraries / have the same functions.
[0] https://abda.nl/lumen/ [1] https://github.com/naim94a/lumen/issues/2
[+] [-] Andoryuuta|3 years ago|reply
[0]: https://arxiv.org/abs/1909.09029 / J. Lacomis et al., "DIRE: A Neural Approach to Decompiled Identifier Naming," 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2019, pp. 628-639, doi: 10.1109/ASE.2019.00064.
[1]: https://github.com/pcyin/dire
[+] [-] glouwbug|3 years ago|reply
Wait, did I say it was possible? I'm curious what a neural netted compiler would produce. Probably your average CRUD software.
[+] [-] efferifick|3 years ago|reply
[+] [-] planede|3 years ago|reply
[+] [-] AlexDenisov|3 years ago|reply
[+] [-] blueflow|3 years ago|reply
[+] [-] kalnins|3 years ago|reply
Compiling down to asm, lot of information is lost regarding memory layout etc, so it's not the best source for generating code.
[+] [-] pjc50|3 years ago|reply
[+] [-] tym0|3 years ago|reply
Edit: Sorry I misremembered, they seem to compile C/C++ code to WASM then back to C: https://hacks.mozilla.org/2021/12/webassembly-and-back-again...
Although technically the plugins could be written in Rust.
[+] [-] vnorilo|3 years ago|reply
[+] [-] thargor90|3 years ago|reply
I don't think this would work, unless the c file just contains inline assembly.
[+] [-] speed_spread|3 years ago|reply
Not that the C will be much more readable than the dissassembly, but there's a chance less information will be lost.
[+] [-] astrange|3 years ago|reply
[+] [-] shin_lao|3 years ago|reply
[+] [-] secondcoming|3 years ago|reply
[+] [-] orra|3 years ago|reply
Of course, times change and AFAICT Ghidra has taken up that mantle.
[+] [-] mdaniel|3 years ago|reply
[+] [-] xphos|3 years ago|reply
[+] [-] richardfey|3 years ago|reply
[+] [-] marcelluscat|3 years ago|reply
[+] [-] orko223|3 years ago|reply
[deleted]
[+] [-] orko223|3 years ago|reply
[deleted]