top | item 32079227

Decompiler Explorer

376 points| todsacerdoti | 3 years ago |dogbolt.org | reply

82 comments

order
[+] psifertex|3 years ago|reply
Sorry for the outages, friends. We're actively working on getting it able to handle higher load but we knew that if we hit HN we'd be swamped no matter what we did. We're spinning up more workers and fixing obvious perf issues as we see them, but if it's not available when you try it make sure to check back later!
[+] athrowaway3z|3 years ago|reply
I really appreciate these kinds of websites.

But i wonder if we eventually go full circle and it becomes easier and cheaper to send a wasm linux kernel with virtual disk access over websockets instead of processing stuff server side.

[+] meibo|3 years ago|reply
I feel like the decompiler space is a little stuck? I mostly go with Hex-Rays out of habit and because I'm used to IDA, but I haven't really seen x64 decompiler output noticeably improve in recent releases.

A lot of my colleagues use Ghidra a lot now and complain about its decompiler regularly.

Is there any new approach in the works? Maybe something ML-based for optimization? Would be sad if Hex-Rays output is "as good as it's gonna get".

[+] tralarpa|3 years ago|reply
> A lot of my colleagues use Ghidra a lot now and complain about its decompiler regularly.

Are your colleagues decompiling obfuscated code (for example malware)? Publicly available decompilers are not working well for that, but I assume that many specalists have their own little improvements and plugins that they don't share with others because it's their core business.

For non-obfuscated code, Ghidra has served me very well, even for entire applications. Often, it has to be pushed into the right direction (for example, by manually specifying the type of a variable) and it sometimes misses some obvious simplifications especially when arrays are involved, but I think those issues could be solved relatively easily by polishing/extending its heuristics. Nothing where I would say that ML is needed, although it would be possible. At the end, most programs contain the same patterns and an ML-based system could help identifying them.

But yeah, obfuscated code, that's something else. There are some academic publications about the usage of ML for that. No idea what's happening inside the company labs, though.

[+] ishitatsuyuki|3 years ago|reply
Rellic [1] implements an algorithm that generates goto-free control flows (citation in README), which would be a significant improvement against what Ghidra/IDA generates currently.

Unfortunately it looks like the maintenance state of the pieces around Rellic isn't very good, and it's quite rocket science to get it building. It doesn't have as much UI/GUI as Ghidra either so it's a bit far from accessible right now.

[1]: https://github.com/lifting-bits/rellic

[+] hoosieree|3 years ago|reply
> Is there any new approach in the works? Maybe something ML-based for optimization?

I'm doing a PhD on this.

My goal is to detect known functions from obfuscated binaries.

The biggest challenge by far is building a good dataset. Unlike computer vision (millions of pictures with the label "dog") the number of training examples for a typical function is one. For now I'm focusing on C standard libraries, since there are a handful of real-world implementations plus some FOSS or students samples available for things like strlen and atoi.

If anyone wants to collaborate, feel free to message me.

[+] develatio|3 years ago|reply
I hear good things about Binary Ninja!
[+] baby|3 years ago|reply
I always found it odd that ida pro was such a pile of poop when it probably made sooo much money
[+] ykl|3 years ago|reply
Love the joke in the URL. :)

(For anyone that doesn't get it; it's a play on Godbolt)

[+] jraph|3 years ago|reply
What is really funny about this is that Godbolt is the last name of the Compiler Explorer's author. But it seems like it is a brand, a word now.

Being able to swap two letters from a name and get something nice like this is lucky.

Godbolt is quite a name.

[+] psifertex|3 years ago|reply
Thanks! We debated it some internally and I'm glad it won out, I think it's worth it. Plus, it has a nice logo that goes with it.
[+] mwcampbell|3 years ago|reply
Can any of these decompilers make effective use of a Microsoft PDB file, if I have one, to include original symbols in the decompiled output? What I'd really like to do with a decompiler is feed it a final compiled EXE or DLL of my own code and see what it looks like after it's been run through whole-program optimization. In that case, of course, I have a PDB file.
[+] spaintech|3 years ago|reply
Ha! that was funny, I wonder though, getting fed tons of code, couldn’t Godbolt leverage code—-> Compiler Obj —-> Assembly as a mean to train an AI decompiler ? Food for thought.
[+] KMnO4|3 years ago|reply
I've always wondered about this. Compilers do a LOT of irreversible stuff. For example, symbol names usually aren't needed (unless you have a reflective language).

Where AI would really shine is reversing the (only seemingly reversible) optimizations. For example, GCC converts "x * 14" into "(x << 4) - x - x". Of course, you can never be 100% sure the programmer didn't actually want "shift left by four followed by two subtractions", but I'm convinced that 99% of the code I write is fairly predictable and statistically similar to whatever giant codebase you train it on.

[+] sargun|3 years ago|reply
Throwing AI at the problem might not actually be the worst suggestion. I wonder how the likes of copilot model the AST. Heh, you might even be able to build an approximation of a compiler using AI.
[+] tralarpa|3 years ago|reply
I think it would be easier and faster to just take the millions of open source projects on github for that :)
[+] thesz|3 years ago|reply
Maximal size of executable is 2MB. So, it is not possible to torture it with the ghc-compiled Haskell program.
[+] no_time|3 years ago|reply
IDA license sponsored by "Yiang Ling Personal License"?

EDIT: Site has changed in multiple ways in the last 30minutes I've been trying to submit my sample. Best of luck in keeping up with demand.

[+] psifertex|3 years ago|reply
Nope, Ilfak gave us a license for it and as Binary Ninja devs we're using a legitimate licensed copy of Binary Ninja as well. All above board and we're hoping to add more commercial decompilers in the future as well as we can integrate them and the companies behind them are willing.

RE: Demand. We just got 2x the workers but as the easy coast wakes up I'm not confident it'll hold up too well, several of the decompilers are... VERY resource intensive so there's really no good way without an exorbitant amount of compute to scale to heavy demand.

Eventually a better queue system with better pre-processing to filter invalid things is on our todo list

[+] rfoo|3 years ago|reply
Unrelated, but it's amazing that over the years I have seen all of misspells of "Jiang Ying" and the "ang ing" part is always right. :P
[+] unnouinceput|3 years ago|reply
HN crowd decompiled the website
[+] psifertex|3 years ago|reply
Yeah, sorry about that. We're working on getting it up again but no promises. I'm on vacation in Europe while the rest of the team is about to head to sleep so might be a bit before we have it more stable.
[+] mikewarot|3 years ago|reply
Long, long ago a friend lost the source to a CP/M program, and wrote ReSource to help re-create the 8080 assembler source from the executable. I ported it as Com2Asm, back in the MS-DOS days... I wonder how good things are now.

How long should I give this thing to run? My upload was 250k.

[+] WiSaGaN|3 years ago|reply
Some nice symmetry there:

Decompiler Explorer: dogbolt.org

Compiler Explorer: godbolt.org

[+] chazeon|3 years ago|reply
A few years ago have tried Hex-Rays/IDA, and it gives me reasonable information in terms of program control flow, and help me with doing hot reverse-engineering without source code. A few years later, Hex-Rays/IDA seems to still be the one to give the most useful information out there, even for hello-world examples.

I remember one of the project came up on my GitHub homepage, but never tried it. Probably this is the only space where I don't feel left out without having to constant following the update, comparing it to JS space, etc..

[+] smcl|3 years ago|reply
Incredible name
[+] lpcvoid|3 years ago|reply
Doesn't this violate (at least) the Hex-Rays license? Fun project, but how is this legal?
[+] psifertex|3 years ago|reply
Nope, not when you ask them and they provide the license.

This is being run with the permission of all the commercial products. In fact, we (Binary Ninja) and Hex-Rays (once I figure out the exact mechanism with Ilfak) are the ones actually paying hosting costs! It's both good for the community and hopefully shows off the value in commercial decompilers. :-)

[+] cinntaile|3 years ago|reply
That probably means they received permission ;)
[+] kangalioo|3 years ago|reply
Hilarious name and fascinating concept haha I like it
[+] 1wd|3 years ago|reply
Are there any decompilers for old x86 DOS com files?
[+] MattKimber|3 years ago|reply
The commercial versions of IDA still do COM and MZ executables as far as I'm aware, although it's been dropped from the free versions since 5. (Which is still available from ScummVM's reverse engineering page, but is only a disassembler and doesn't come with the decompiler)

Ghidra does a vaguely OK job with MZ executables, providing they've been unpacked first. It really struggles to represent DOS function calls properly, you'll find arguments go missing from the decompiled code. There are some third-party plugins which improve things a bit. And it doesn't have signatures for any of the libraries so the output will just be a lot of `if ((var26 & 0xFEEE) && var42 > 0xE0) { ... }` and it's up to you to work out when one of the variables is actually a pointer to video memory or whatever.

Reko can also decompile this era of code, it does have a tendency to crash on any more complex program but will be fine for simple files. Similar problem that the decompiled pseudo-C code doesn't really illuminate what's going on any more than just reading the disassembled x86 assembly language and walking through any tricky sections in the DOSBox debugger does. Without all of the Win32 API calls modern programs make there's a lot more work needed to figure out what's going on.

Personally I find I also end up needing to use a vintage tool like Sourcer alongside the modern ones, because the newer stuff doesn't annotate things which were common in the era like directly referencing the BIOS data area or reading the interrupt table from memory rather than using the DOS calls for it. It's that or spending a lot of your reverse-engineering time discovering how things were done in the DOS days.

[+] mobilio|3 years ago|reply
You can use old versions of IDA, IDA for DOS or i think Sourcerer.
[+] Terry_Roll|3 years ago|reply
BinaryNinja: Error decompiling: Traceback (most recent call last): File "decompile_bn.py", line 66, in <module> main() File "decompile_bn.py", line 13, in main t = tempfile.NamedTemporaryFile() File "/usr/local/lib/python3.8/tempfile.py", line 531, in NamedTemporaryFile prefix, suffix, dir, output_type = _sanitize_params(prefix, suffix, dir) File "/usr/local/lib/python3.8/tempfile.py", line 117, in _sanitize_params dir = gettempdir() File "/usr/local/lib/python3.8/tempfile.py", line 286, in gettempdir tempdir = _get_default_tempdir() File "/usr/local/lib/python3.8/tempfile.py", line 218, in _get_default_tempdir raise FileNotFoundError(_errno.ENOENT, FileNotFoundError: [Errno 2] No usable temporary directory found in ['/tmp', '/var/tmp', '/usr/tmp', '/home/decompiler_user']

Hex-Rays: Error decompiling: /tmp/tmpanbyzjw9/tmpqx8sjhpv: is not decompilable

angr and Ghidra still waiting at 150seconds and counting....

320seconds and counting....

Boomerang: Error decompiling: Traceback (most recent call last): File "decompile_boomerang.py", line 57, in <module> main() File "decompile_boomerang.py", line 14, in main with tempfile.TemporaryDirectory() as tempdir: File "/usr/local/lib/python3.8/tempfile.py", line 780, in __init__ self.name = mkdtemp(suffix, prefix, dir) File "/usr/local/lib/python3.8/tempfile.py", line 347, in mkdtemp prefix, suffix, dir, output_type = _sanitize_params(prefix, suffix, dir) File "/usr/local/lib/python3.8/tempfile.py", line 117, in _sanitize_params dir = gettempdir() File "/usr/local/lib/python3.8/tempfile.py", line 286, in gettempdir tempdir = _get_default_tempdir() File "/usr/local/lib/python3.8/tempfile.py", line 218, in _get_default_tempdir raise FileNotFoundError(_errno.ENOENT, FileNotFoundError: [Errno 2] No usable temporary directory found in ['/tmp', '/var/tmp', '/usr/tmp', '/home/decompiler_user']

RecStudio: Error decompiling: Traceback (most recent call last): File "decompile_recstudio.py", line 59, in <module> main() File "decompile_recstudio.py", line 14, in main with tempfile.TemporaryDirectory() as tempdir: File "/usr/local/lib/python3.8/tempfile.py", line 780, in __init__ self.name = mkdtemp(suffix, prefix, dir) File "/usr/local/lib/python3.8/tempfile.py", line 347, in mkdtemp prefix, suffix, dir, output_type = _sanitize_params(prefix, suffix, dir) File "/usr/local/lib/python3.8/tempfile.py", line 117, in _sanitize_params dir = gettempdir() File "/usr/local/lib/python3.8/tempfile.py", line 286, in gettempdir tempdir = _get_default_tempdir() File "/usr/local/lib/python3.8/tempfile.py", line 218, in _get_default_tempdir raise FileNotFoundError(_errno.ENOENT, FileNotFoundError: [Errno 2] No usable temporary directory found in ['/tmp', '/var/tmp', '/usr/tmp', '/home/decompiler_user']

Reko: Error decompiling: Traceback (most recent call last): File "decompile_recstudio.py", line 59, in <module> main() File "decompile_recstudio.py", line 14, in main with tempfile.TemporaryDirectory() as tempdir: File "/usr/local/lib/python3.8/tempfile.py", line 780, in __init__ self.name = mkdtemp(suffix, prefix, dir) File "/usr/local/lib/python3.8/tempfile.py", line 347, in mkdtemp prefix, suffix, dir, output_type = _sanitize_params(prefix, suffix, dir) File "/usr/local/lib/python3.8/tempfile.py", line 117, in _sanitize_params dir = gettempdir() File "/usr/local/lib/python3.8/tempfile.py", line 286, in gettempdir tempdir = _get_default_tempdir() File "/usr/local/lib/python3.8/tempfile.py", line 218, in _get_default_tempdir raise FileNotFoundError(_errno.ENOENT, FileNotFoundError: [Errno 2] No usable temporary directory found in ['/tmp', '/var/tmp', '/usr/tmp', '/home/decompiler_user']

RetDec and Snowman are the only ones that work on a sample app supplied.

If I get time, I'll upload another app to test it, which will introduces a new technique.

I know for a fact Ghidra should work because I've used it myself.

[+] psifertex|3 years ago|reply
Was this due to load or server restarts or are you still seeing errors? Pass me a GUID either publicly or privately (my handle on twitter accepts DMs or an email address at my handle.com as a domain) if you don't mind and I can take a closer look.
[+] Terry_Roll|3 years ago|reply
Nice phish! That will come in handy.