top | item 47041252

(no title)

bri3d | 13 days ago

Claude is doing the decompilation here, right? Has this been compared against using a traditional decompiler with Claude in the loop to improve decompilation and ensure matched results? I would think that Claude’s training data would include a lot more pseudo-C <-> C knowledge than MIPS assembler from GCC 2.7 and C pairs, and even if the traditional decompiler was kind of bad at N64 it would be more efficient to fix bad decompiler C than assembler.

discuss

titzer|13 days ago

It's wild to me that they wouldn't try this first. Feeding the asm directly into the model seems like intentionally ignoring a huge amount of work that has gone in traditional decompilation. What LLMs excel at (names, context, searching in high-dimensional space, making shit up) is very different from, e.g. coming up with an actual AST with infix expressions that represents asm code.

skerit|12 days ago

I've been doing some decompilation with Ghidra. Unfortunately, it's of a C++ game, which Ghidra isn't really great at. And thus Claude gets a bit confused about it all too. But all in all: it does work, and I've been able to reconstruct a ton of things already.

sestep|13 days ago

One of the other PhD students in my department has an NDSS 2026 paper about combining the strengths of both LLMs and traditional decompilers! https://lukedramko.github.io/files/idioms.pdf

suprjami|13 days ago

Not Claude, but there are open-weight LLMs trained specifically on Ghidra decomp and tested on their ability to help reverse engineers make sense of it:

https://huggingface.co/LLM4Binary/llm4decompile-22b-v2

There's also a dataset floating around HF which is... I think a popular N64 decomp to pseudo-C? Maybe the Mario one?