For anyone else who was initially confused by this, useful context is that Snowboard Kids 2 is an N64 game.
I also wasn't familiar with this terminology:
> You hand it a function; it tries to match it, and you move on.
In decompilation "matching" means you found a function block in the machine code, wrote some C, then confirmed that the C produces the exact same binary machine code once it is compiled.
Snowboard Kids 2 was a great N64 game. It was one of a number of racing titles inspired by Mario Kart, but the snowboarding added a bit of a different feel. The battle items were clever, and the stages were really well made given the technical limitations they faced. As a kid with two brothers, we played a lot of competitive multiplayer.
I also remember a few things in the singleplayer being very difficult. The number of times I had to fight/race Dameian in his giant robot running down the mountainside... It's carved into my brain like that footrace against Wizpig in DKR or the Donkey Kong arcade game for the Rareware coin in DK64.
The battle items in Snowboard Kids were clever and memorable. The parachute missile that would launch racers up in the air and then deploy the parachute so they slowly float back down was such a frustrating item to be hit with. The pans that would hit all opponents was iconic and it was hilarious that you could somehow doge it with invisibility. Even the basic rock dropped on the course was somehow memorable.
Great game. It's heartwarming to know that others still remember it and care about it.
We've been using LLMs for security research (finding vulnerabilities in ML frameworks) and the pattern is similar - it's surprisingly good at the systematic parts (pattern recognition, code flow analysis) when you give it specific constraints and clear success criteria.
The interesting part: the model consistently underestimates its own speed. We built a complete bug bounty submission pipeline - target research, vulnerability scanning, POC development - in hours when it estimated days. The '10 attempts' heuristic resonates - there's definitely a point where iteration stops being productive.
For decompilation specifically, the 1M context window helps enormously. We can feed entire codebases and ask 'trace this user input to potential sinks' which would be tedious manually. Not perfect, but genuinely useful when combined with human validation.
The key seems to be: narrow scope + clear validation criteria + iterative refinement. Same as this decompilation work.
I'd like to see this given a bit more structure, honestly. What occurs to me is constraining the grammar for LLM inference to ensure valid C89 (or close-to, as much can be checked without compilation), then perhaps experimentally switching to a permuter once/if a certain threshold is reached for accuracy of the decompiled function.
Eventually some or many of these attempts would, of course, fail, and require programmer intervention, but I suspect we might be surprised how far it could go.
In decompilation "matching" means you found a function block in the machine code, wrote some C, then confirmed that the C produces the exact same binary machine code once it is compiled.
They had access to the same C compiler used by Nintendo in 1999? And the register allocation on a MIPS CPU is repeatable enough to get an exact match? That's impressive.
It's worth noting here that the author came up with a handful of good heuristics to guide Claude and a very specific goal, and the LLM did a good job given those constraints. Most seasoned reverse engineers I know have found similar wins with those in place.
What LLMs are (still?) not good at is one-shot reverse engineering for understanding by a non-expert. If that's your goal, don't blindly use an LLM. People already know that you getting an LLM to write prose or code is bad, but it's worth remembering that doing this for decompilation is even harder :)
Agree with this. I'm a software engineer that has mostly not had to manage memory for most of my career.
I asked Opus how hard it would be to port the script extender for Baldurs Gate 3 from Windows to the native Linux Build. It outlined that it would be very difficult for someone without reverse engineering experience, and correctly pointed out they are using different compilers, so it's not a simple mapping exercise. It's recommendation was not to try unless I was a Ghrida master and had lots of time in my hands.
> The ‘give up after ten attempts’ threshold aims to prevent Claude from wasting tokens when further progress is unlikely. It was only partially successful, as Claude would still sometimes make dozens of attempts.
Not what I would have expected from a 'one-shot'. Maybe self-supervised would be a more suitable term?
Meh, the main idea of one-shot is that you prompted it once and got a good impl when it decided it was done. As opposed to having to workshop yourself with additional prompts to fix things.
It doesn't do it in one-shot on the GPU either. It feeds outputs back into inputs over and over. By the time you see tokens as an end-user, the clanker has already made a bunch of iterations.
I’ve been having fun sending Claude down the old school MUD route, giving it access to a SMAUG derivative and once it’s mastered the play, give it admin powers to create new play experiences.
I stayed away from decompilation and reverse engineering, for legal reasons.
Claude is amazing. It can sometimes get stuck in a reason loop but will break away, reassess, and continue on until it finds its way.
Claude was murdered in a dark instance dungeon when it managed to defeat the dragon but ran out of lamp oil and torches to find its way out. Because of the light system it kept getting “You can’t seem to see anything in the darkness” and randomly walked into a skeleton lair.
Super fun to watch from an observer. Super terrifying that this will replace us at the office.
I've been experimenting with running Claude in headless mode + a continuous loop to decompile N64 functions and the results have been pretty incredible. (This is despite already using Claude in my decompilation workflow).
One thing I don't annoying in really old sources is that sometimes you can't go function by function, because the code will occasionally just use a random register to pass results. Passing the whole file works better at that point.
This sounds interesting! Do you have some good introduction to N64 decompiliation? Would you recommend using Claude right from the start or rather try to get to know the ins and outs of N64 decomp?
This is super cool! I would be curious to see how Gemini 3 fares… I've found it to be even more effective than Opus 4.5 at technical analysis (in another domain).
The article is a useful resource for setting up automated flows, and Claude is great at assembly. Codex less so, Gemini is also good at assembly. Gemini will happily hand roll x86_64 bytecode. Codex appears optimized for more "mainstream" dev tasks, and excels at that. If only Gemini had a great agent...
I ran Node with --print-opt-code and had Opus look at Turbofan's output. It was able to add comments to the JIT'ed code and give suggestions on how to improve the JavaScript for better optimization.
There are quite a few comments here on code obfuscation.
The hardest form of code obfuscation is called homomorphic computing, which is code transformed to act on encrypted data isomorphically to regular code on regular data. The homomorphic code is hard obfuscated by this transformation.
Now create a homomorphic virtual machine, that operates on encrypted code over encrypted data. Very hard to understand.
Now add data encryption/decryption algorithms, both homomorphically encrypted to be run by the virtual machine, to prepare and recover inputs, outputs or effects of any data or event information, for the homomorphic application code. Now that all data within the system is encrypted by means which are hard obfuscated, running on code which is hard obfuscated, the entire system becomes hard^2 (not a formal measure) opaque.
This isn't realistic in practice. Homomorphic implementations of even simple functions are extremely inefficient for the time being. But it is possible, and improvements in efficiency have not been exhausted.
Equivalent but different implementations of homomorphic code can obviously be made. However, given the only credible explanations for design decisions of the new code are, to exactly match the original code, this precludes any "clean room" defenses.
--
Implementing software with neural network models wouldn't stop replication, but would decompile as source that was clearly not developed independent from the original implementation.
Even distilling (training a new model on the "decompiled" model) would be dead giveaway that it was derived directly from the source, not a clean room implementation.
--
I have wondered, if quantum computing wouldn't enable an efficient version of homomorphic computing over classical data.
Im a encryption noob. Less than a noob. But something I've been wondering about is how can homomorphic computing be opaque/unencryptable?
If you are able to monitor what happens to encrypted data being processed by an LLM, could you not match that with the same patterns created by unencrypted data?
Real simple example, let's say I have a program that sums numbers. One sends the data to an LLM or w/e unencrypted, the other encrypted.
Wouldn't the same part of the LLM/compute machine "light up" so to speak?
I used Gemini to compare the minimized output of the Rollup vs Rolldown JavaScript bundlers to find locations where the latter was not yet at the same degree of optimization. It was astoundingly good and I'm not sure how I would have been able to accomplish the task without an LLM as an available tool.
Yeah, it works great for porting as well. I tried it on the assembler sources of Prince of Persia for Apple ii and went from nothing to basics being playable (with a few bugs but still) on modern Mac with SDL graphics within a day.
Wow, I haven't thought of this game since I played it as a kid. My friend would bring it over all the time for sleep overs. I'm going to try to emulate it right now for old time's sake. I loved this game.
Am I just wrong in thinking doing decompilation of copyrighted code via the cloud is a bad idea?
Like, if it ever leaks, or you were planning on releasing it, literally every step you took in your crime is uploaded to the cloud ready to send you to prison.
It's what's stopped me from using hosted LLMs for DMCA-legal RE. All it takes is for a prosecutor/attorney to spin a narrative based on uploaded evidence and your ass is in court.
It wouldn't fit most of the current LLM cloud providers narrative about privacy and copyright either, so, not sure they would be as cooperative with a prosecutor as they are today with lawmakers and right holders.
Rather than insisting on byte perfect matches, sometimes you can prove code equivalence of machine code sequences using SAT solvers. That might be an interesting extension, maybe giving clearer code output and/or solution to difficult functions in some cases.
[+] [-] simonw|3 months ago|reply
I also wasn't familiar with this terminology:
> You hand it a function; it tries to match it, and you move on.
In decompilation "matching" means you found a function block in the machine code, wrote some C, then confirmed that the C produces the exact same binary machine code once it is compiled.
The author's previous post explains this all in a bunch more detail: https://blog.chrislewis.au/using-coding-agents-to-decompile-...
[+] [-] slavik81|3 months ago|reply
I also remember a few things in the singleplayer being very difficult. The number of times I had to fight/race Dameian in his giant robot running down the mountainside... It's carved into my brain like that footrace against Wizpig in DKR or the Donkey Kong arcade game for the Rareware coin in DK64.
The battle items in Snowboard Kids were clever and memorable. The parachute missile that would launch racers up in the air and then deploy the parachute so they slowly float back down was such a frustrating item to be hit with. The pans that would hit all opponents was iconic and it was hilarious that you could somehow doge it with invisibility. Even the basic rock dropped on the course was somehow memorable.
Great game. It's heartwarming to know that others still remember it and care about it.
[+] [-] govping|3 months ago|reply
The interesting part: the model consistently underestimates its own speed. We built a complete bug bounty submission pipeline - target research, vulnerability scanning, POC development - in hours when it estimated days. The '10 attempts' heuristic resonates - there's definitely a point where iteration stops being productive.
For decompilation specifically, the 1M context window helps enormously. We can feed entire codebases and ask 'trace this user input to potential sinks' which would be tedious manually. Not perfect, but genuinely useful when combined with human validation.
The key seems to be: narrow scope + clear validation criteria + iterative refinement. Same as this decompilation work.
[+] [-] your_sweetpea|3 months ago|reply
Eventually some or many of these attempts would, of course, fail, and require programmer intervention, but I suspect we might be surprised how far it could go.
[+] [-] Animats|3 months ago|reply
They had access to the same C compiler used by Nintendo in 1999? And the register allocation on a MIPS CPU is repeatable enough to get an exact match? That's impressive.
[+] [-] tails4e|3 months ago|reply
[+] [-] elitan|3 months ago|reply
[+] [-] saagarjha|3 months ago|reply
What LLMs are (still?) not good at is one-shot reverse engineering for understanding by a non-expert. If that's your goal, don't blindly use an LLM. People already know that you getting an LLM to write prose or code is bad, but it's worth remembering that doing this for decompilation is even harder :)
[+] [-] zdware|3 months ago|reply
I asked Opus how hard it would be to port the script extender for Baldurs Gate 3 from Windows to the native Linux Build. It outlined that it would be very difficult for someone without reverse engineering experience, and correctly pointed out they are using different compilers, so it's not a simple mapping exercise. It's recommendation was not to try unless I was a Ghrida master and had lots of time in my hands.
[+] [-] ph4evers|3 months ago|reply
[+] [-] t_mann|3 months ago|reply
Not what I would have expected from a 'one-shot'. Maybe self-supervised would be a more suitable term?
[+] [-] wavemode|3 months ago|reply
See also, "zero-shot" / "few-shot" etc.
[+] [-] voiper1|3 months ago|reply
[+] [-] johnfn|3 months ago|reply
[+] [-] hombre_fatal|3 months ago|reply
It doesn't do it in one-shot on the GPU either. It feeds outputs back into inputs over and over. By the time you see tokens as an end-user, the clanker has already made a bunch of iterations.
[+] [-] reactordev|3 months ago|reply
I stayed away from decompilation and reverse engineering, for legal reasons.
Claude is amazing. It can sometimes get stuck in a reason loop but will break away, reassess, and continue on until it finds its way.
Claude was murdered in a dark instance dungeon when it managed to defeat the dragon but ran out of lamp oil and torches to find its way out. Because of the light system it kept getting “You can’t seem to see anything in the darkness” and randomly walked into a skeleton lair.
Super fun to watch from an observer. Super terrifying that this will replace us at the office.
[+] [-] thelittleone|3 months ago|reply
[+] [-] rlili|3 months ago|reply
[+] [-] knackers|3 months ago|reply
I hope that others find this similarly useful.
[+] [-] viraptor|3 months ago|reply
[+] [-] plastic-enjoyer|3 months ago|reply
[+] [-] turnsout|3 months ago|reply
[+] [-] djmips|3 months ago|reply
[+] [-] unknown|3 months ago|reply
[deleted]
[+] [-] garrettjoecox|3 months ago|reply
[+] [-] ACCount37|3 months ago|reply
It's good at cleaning up decompiled code, at figuring out what functions do, at uncovering weird assembly tricks and more.
[+] [-] keepamovin|3 months ago|reply
[+] [-] skerit|3 months ago|reply
[+] [-] amelius|3 months ago|reply
Anyway, we're reaching the point where documentation can be generated by LLMs and this is great news for developers.
[+] [-] lomase|3 months ago|reply
[+] [-] unknown|3 months ago|reply
[deleted]
[+] [-] sehugg|3 months ago|reply
[+] [-] Nevermark|3 months ago|reply
The hardest form of code obfuscation is called homomorphic computing, which is code transformed to act on encrypted data isomorphically to regular code on regular data. The homomorphic code is hard obfuscated by this transformation.
Now create a homomorphic virtual machine, that operates on encrypted code over encrypted data. Very hard to understand.
Now add data encryption/decryption algorithms, both homomorphically encrypted to be run by the virtual machine, to prepare and recover inputs, outputs or effects of any data or event information, for the homomorphic application code. Now that all data within the system is encrypted by means which are hard obfuscated, running on code which is hard obfuscated, the entire system becomes hard^2 (not a formal measure) opaque.
This isn't realistic in practice. Homomorphic implementations of even simple functions are extremely inefficient for the time being. But it is possible, and improvements in efficiency have not been exhausted.
Equivalent but different implementations of homomorphic code can obviously be made. However, given the only credible explanations for design decisions of the new code are, to exactly match the original code, this precludes any "clean room" defenses.
--
Implementing software with neural network models wouldn't stop replication, but would decompile as source that was clearly not developed independent from the original implementation.
Even distilling (training a new model on the "decompiled" model) would be dead giveaway that it was derived directly from the source, not a clean room implementation.
--
I have wondered, if quantum computing wouldn't enable an efficient version of homomorphic computing over classical data.
Just some wild thoughts.
[+] [-] YesBox|3 months ago|reply
If you are able to monitor what happens to encrypted data being processed by an LLM, could you not match that with the same patterns created by unencrypted data?
Real simple example, let's say I have a program that sums numbers. One sends the data to an LLM or w/e unencrypted, the other encrypted.
Wouldn't the same part of the LLM/compute machine "light up" so to speak?
[+] [-] benmccann|3 months ago|reply
[+] [-] viraptor|3 months ago|reply
[+] [-] lomase|3 months ago|reply
https://github.com/NagyD/SDLPoP
[+] [-] djmips|3 months ago|reply
[+] [-] hombre_fatal|3 months ago|reply
[+] [-] heavyset_go|3 months ago|reply
Like, if it ever leaks, or you were planning on releasing it, literally every step you took in your crime is uploaded to the cloud ready to send you to prison.
It's what's stopped me from using hosted LLMs for DMCA-legal RE. All it takes is for a prosecutor/attorney to spin a narrative based on uploaded evidence and your ass is in court.
[+] [-] Juliate|3 months ago|reply
[+] [-] throwaway81523|3 months ago|reply
[+] [-] butz|3 months ago|reply
[+] [-] unknown|3 months ago|reply
[deleted]
[+] [-] wiz21c|3 months ago|reply
[+] [-] grim_io|3 months ago|reply
[+] [-] DrNosferatu|3 months ago|reply
[+] [-] VikingCoder|3 months ago|reply