One day I was using ghidra to decompile something to find out how it works, and the LLM helped a lot. It was a game changer in refactoring of the decompiled assembly-that-looked-like-c language.
> Stiver decided to write his own decompiler as a side project. To overcome the weaknesses of existing alternatives, he took a different approach. After reading the bytecode, he constructed a control-flow graph in static single-assignment form, which is much better to express the program semantics abstracting the particular shape of bytecode. At the beginning of this project, Stiver knew little about static analysis and compiler design and had to learn a lot, but the effort was worth it. The resulting decompiler produced much better results than anything available at that time. It could even decompile the bytecode produced by some obfuscators without any explicit support.
Answer is pretty vague though, but sounds like it’s about not trying to “reverse” what the compiler did, but rather try and “analytically” work put what source code would likely have yielded the byte code it’s looking at?
As far as I can tell (although I"m a novice at RE), in the native world all non-trivial decompilers are "analytical", doing things like control-flow recovery and such. I guess the only reason why the first java decompiler was "non-analytical" is that the bytecode (at least in early days) was simple enough that you could basically pattern-match it back to instructions.
So if I'd have to give a definition I pulled out of my ass:
* non-analytical compiler: "local", works only at the instruction or basic-block level, probably done by just pattern matching templates
* analytical: anything that does non-local transformations, working across basic-blocks to recover logic and control flow
I found this amusing, from a Java perspective. The 3-character command-line options are also very "not Java-ish". However, since this one is also written in Java, a good test is if it can decompile itself perfectly and the result recompiled to a matching binary; much like how bootstrapping a compiler involves compiling itself and checking for the existence of the fixed-point.
I mean, in the general case is it not impossible to "put the code in the correct lines"?
Maybe I'm just misunderstanding you, but even if the bytecode sequence is reconstructed as the original code that produced it, stuff like whitespace and comments are simply lost with no ways to recover.
(Also, local variable names, certain annotations depending on their retention level, etc)
It's not a perfect decompiler, some obfuscated code gets decompiled into commented-out bytecode.
However, most of the time it'll output perfectly valid Java code that'll compile if you just create the necessary maven/ant/gradle build configuration to get all of the sources loaded correctly.
I've actually had this fix a bug before. An O(n^2) issue adding a character at time to a string inside a loop.
I had decompiled the class, fixed the issue, checked in the original decompiled source and then the change. Then a coworker pointed out that the original decompiled source also fixed the issue.
After a bit of digging, I learned that hotspot compiler had code to detect and fix the issue, but it was looking for the pattern generated by a modern compiler, and the library was compiled with an older compiler.
(It's been a while, but I think it was the JAI library, and the issue was triggered by long comments in a PNG.)
stevoski|5 months ago
vbezhenar|5 months ago
brap|5 months ago
Hackbraten|5 months ago
[0]: https://thejunkland.com/blog/using-llms-to-reverse-javascrip...
[1]: https://github.com/jehna/humanify/blob/main/README.md#exampl...
cogman10|5 months ago
p0w3n3d|5 months ago
asplake|5 months ago
That really deserves a link. What is an “analytical” decompiler?
lbalazscs|5 months ago
> Stiver decided to write his own decompiler as a side project. To overcome the weaknesses of existing alternatives, he took a different approach. After reading the bytecode, he constructed a control-flow graph in static single-assignment form, which is much better to express the program semantics abstracting the particular shape of bytecode. At the beginning of this project, Stiver knew little about static analysis and compiler design and had to learn a lot, but the effort was worth it. The resulting decompiler produced much better results than anything available at that time. It could even decompile the bytecode produced by some obfuscators without any explicit support.
https://blog.jetbrains.com/idea/2024/11/in-memory-of-stiver/
jakewins|5 months ago
Answer is pretty vague though, but sounds like it’s about not trying to “reverse” what the compiler did, but rather try and “analytically” work put what source code would likely have yielded the byte code it’s looking at?
krackers|5 months ago
So if I'd have to give a definition I pulled out of my ass:
* non-analytical compiler: "local", works only at the instruction or basic-block level, probably done by just pattern matching templates
* analytical: anything that does non-local transformations, working across basic-blocks to recover logic and control flow
nneonneo|5 months ago
Over in .NET land, dnSpy (https://github.com/dnSpyEx/dnSpy) works very well, even on many obfuscated binaries.
userbinator|5 months ago
I found this amusing, from a Java perspective. The 3-character command-line options are also very "not Java-ish". However, since this one is also written in Java, a good test is if it can decompile itself perfectly and the result recompiled to a matching binary; much like how bootstrapping a compiler involves compiling itself and checking for the existence of the fixed-point.
krackers|5 months ago
Igor_Wiwi|5 months ago
enoent|5 months ago
0x1ceb00da|5 months ago
mudkipdev|5 months ago
yardstick|5 months ago
p0w3n3d|5 months ago
bartekpacia|5 months ago
[I work at JetBrains]
gf000|5 months ago
Maybe I'm just misunderstanding you, but even if the bytecode sequence is reconstructed as the original code that produced it, stuff like whitespace and comments are simply lost with no ways to recover.
(Also, local variable names, certain annotations depending on their retention level, etc)
hunterpayne|5 months ago
nunobrito|5 months ago
jeroenhd|5 months ago
However, most of the time it'll output perfectly valid Java code that'll compile if you just create the necessary maven/ant/gradle build configuration to get all of the sources loaded correctly.
dunham|5 months ago
I had decompiled the class, fixed the issue, checked in the original decompiled source and then the change. Then a coworker pointed out that the original decompiled source also fixed the issue.
After a bit of digging, I learned that hotspot compiler had code to detect and fix the issue, but it was looking for the pattern generated by a modern compiler, and the library was compiled with an older compiler.
(It's been a while, but I think it was the JAI library, and the issue was triggered by long comments in a PNG.)
unknown|5 months ago
[deleted]
unknown|5 months ago
[deleted]
BinaryIgor|5 months ago
techlatest_net|5 months ago
[deleted]