If you want to obtain a "C" pseudocode, you can give a wasm file to wasm2c [1].
You can re-obtain a WebAssembly folded-expression text format using wasm2wat [1].
You can obtain a call-graph from a WebAssembly module by generating the wat representation using wasm2wat and pasting it into main.wat on https://webassembly.studio/ (-> Empty Wat Project). Then save and build; right click the new main.wasm and select "Generate Call Graph."
That said, check out this encrypted and anonymous "pastebin" I built [2] with the crypto being written in Rust and bindings generated using wasm-bindgen [3]. It surprisingly hard to debug when optimized using wasm-opt [4].
WebAssembly is not "simple to work with", especially when it comes to analyzing non-trivial, large, optimized programs. The tool [1] generates a one-by-one equivalence of wasm instructions to C code. I guess you could qualify that as a "decompiler", but real decompilers - the ones used for malware analysis such as JEB or IDA - are optimizing decompilers that provide an output of higher level (eg more legible) than the input disassembly/binary.
The future looks like everyone's going to use his fav language to compile to WebAssembly
Am I the only one who feels like it's the end of the web as we knew it in the 90s and 00s, where you could open any web page, understand how it works and learn from it ?
I think that disassembling WebAssembly is easier than trying to make sense of a highly minified/obfuscated JavaScript bundle. The time of easily readable web code has been dead for quite a while now I would say.
I don't think it's that different from minimized and sometimes intentionally obfuscated JS. In fact I think it's been getting better recently as there are tools developed to debug that kind of code - sourcemaps etc.
If it means the end of JavaScript, I'm all in. Let's get back to having real application languages for applications, and markup languages for text.
I hope that we'll come around to the idea that this two-decades long fascination with abusing the hell out of web technology was a fever dream, and go on to build something better on more substantial foundations.
I think it's quite the opposite in terms of languages. Javascript will have the best WASM interop story for quite a while.
As WASM gets adopted we'll see it get used in all sorts of places outside the browser. Many projects need a high-level scripting language and JavaScript will be the obvious choice.
Well, someone is bound to make disassemblers/decompilers/debuggers for us to dissect the innards of the VM.
As a side note, I am half expecting the announcement of a web assembly ISA any day.
WASM instructions are fairly straightforward so an obfuscator can be written quite easily. I could easily create a proxy tool that introduces randomization/non-determinism on a per download basis if it were worth it. There is no execution of arbitrary memory so there are limits. JS can create new WASM mods and link them at runtime, but invocations across import/export might have a performance hit. But moving around functions, subdividing functions, etc is really easy.
Also, the paper has Emscripten-specific reverse engineering details (such as locations in the mem for where stack starts vs where heap starts) that don't apply to many other WASM compilers.
My reading of the disassembled code on page 7 is that the "end" opcode actually takes one byte, and isn't just a syntactic structure that the assembler removes. As if a Lisp VM had a close-paren instruction.
+003Eh: i32.eqz
+003Fh: if $3
+0041h: br $2 (---> break out of $2 (BLOCK))
+0043h: end
+0044h: get_local $12
What's the purpose of having an "end" opcode? Is there no overhead at runtime because it evaporates when the code is compiled? It it to avoid having forward referencing offsets in the code? Is it just in there for verification purposes?
It's kind of like a "comefrom" opcode, a target that other opcodes jump to (or after)!
I have an end opcode in the TXR Lisp virtual machine. It delimits blocks of code that have some sort of context attached.
The opcode allows the virtual machine interpreter to recurse on itself; when it hits the end, the dispatch loop executes return to bail out to the higher level of recursion. Thus end is also useful for exiting the top-level invocation of the VM. It is required, in fact; if the end instruction is not present, the interpreter will keep marching through memory past the end of the routine. No wasteful check is needed whether the instruction pointer is past the code block.
my end instruction also specifies a result value (because the machine is register based; there is no top-of-stack implicit value). This becomes the return value of a procedure when the final end is executed. The block instruction also uses it. When a (block ...) is compiled, the return value of the ordinary block termination is specified in the end instruction at the end of the block. Control returns to the block instruction which receives that value.
end has something in common with the x86 ret instruction and its ilk. It's not so much an exotic "come from" as an ordinary "return".
[+] [-] KenanSulayman|7 years ago|reply
If you want to obtain a "C" pseudocode, you can give a wasm file to wasm2c [1].
You can re-obtain a WebAssembly folded-expression text format using wasm2wat [1].
You can obtain a call-graph from a WebAssembly module by generating the wat representation using wasm2wat and pasting it into main.wat on https://webassembly.studio/ (-> Empty Wat Project). Then save and build; right click the new main.wasm and select "Generate Call Graph."
That said, check out this encrypted and anonymous "pastebin" I built [2] with the crypto being written in Rust and bindings generated using wasm-bindgen [3]. It surprisingly hard to debug when optimized using wasm-opt [4].
[1] Part of WebAssembly Binary Toolkit: https://github.com/WebAssembly/wabt
[2] Source code on Github: https://github.com/psychonautwiki/impis/blob/master/core/src... — Demo paste: https://imp.is/n/7NFsfEiCjkFBVgC6A4JS6GyqN7puN5Sg7ed11m8VrtT...
[3] https://github.com/rustwasm/wasm-bindgen
[4] Part of Binaryen: https://github.com/WebAssembly/binaryen
[+] [-] piphf|7 years ago|reply
[+] [-] mbebenita|7 years ago|reply
[+] [-] ttoinou|7 years ago|reply
Am I the only one who feels like it's the end of the web as we knew it in the 90s and 00s, where you could open any web page, understand how it works and learn from it ?
[+] [-] emily-c|7 years ago|reply
[+] [-] sp332|7 years ago|reply
[+] [-] megaman22|7 years ago|reply
I hope that we'll come around to the idea that this two-decades long fascination with abusing the hell out of web technology was a fever dream, and go on to build something better on more substantial foundations.
I'm probably going to be disappointed...
[+] [-] pjmlp|7 years ago|reply
[+] [-] digi_owl|7 years ago|reply
[+] [-] trgv|7 years ago|reply
My understanding (maybe wrong) was that this was going to be available in the browser.
[+] [-] mchahn|7 years ago|reply
Web browsers will/already show wasm disassembly when opened in the browser tools. A file can contain label metadata which makes it very readable.
[+] [-] 87|7 years ago|reply
As WASM gets adopted we'll see it get used in all sorts of places outside the browser. Many projects need a high-level scripting language and JavaScript will be the obvious choice.
[+] [-] z3phyr|7 years ago|reply
[+] [-] kodablah|7 years ago|reply
Also, the paper has Emscripten-specific reverse engineering details (such as locations in the mem for where stack starts vs where heap starts) that don't apply to many other WASM compilers.
[+] [-] DonHopkins|7 years ago|reply
It's kind of like a "comefrom" opcode, a target that other opcodes jump to (or after)!
[+] [-] kazinator|7 years ago|reply
The opcode allows the virtual machine interpreter to recurse on itself; when it hits the end, the dispatch loop executes return to bail out to the higher level of recursion. Thus end is also useful for exiting the top-level invocation of the VM. It is required, in fact; if the end instruction is not present, the interpreter will keep marching through memory past the end of the routine. No wasteful check is needed whether the instruction pointer is past the code block.
my end instruction also specifies a result value (because the machine is register based; there is no top-of-stack implicit value). This becomes the return value of a procedure when the final end is executed. The block instruction also uses it. When a (block ...) is compiled, the return value of the ordinary block termination is specified in the end instruction at the end of the block. Control returns to the block instruction which receives that value.
end has something in common with the x86 ret instruction and its ilk. It's not so much an exotic "come from" as an ordinary "return".
[+] [-] Maijin212|7 years ago|reply
[+] [-] saredust|7 years ago|reply
[deleted]