What’s in that .wasm? Introducing wasm-decompile

[+] snazz|6 years ago|reply

This looks much nicer than the wasm2c output for that binary. I compiled it with `clang wasm.c -c -target wasm32 -O2` just like in the instructions (I'm on LLVM 10), and used the latest wasm2wat with `wasm2wat -f wasm.o` and got this instead:

  (module
    (type (;0;) (func (param i32 i32) (result f32)))
    (import "env" "__linear_memory" (memory (;0;) 0))
    (import "env" "__indirect_function_table" (table (;0;) 0 funcref))
    (func (;0;) (type 0) (param i32 i32) (result f32)
      (f32.add
        (f32.add
          (f32.mul
            (f32.load
              (local.get 0))
            (f32.load
              (local.get 1)))
          (f32.mul
            (f32.load offset=4
              (local.get 0))
            (f32.load offset=4
              (local.get 1))))
        (f32.mul
          (f32.load offset=8
            (local.get 0))
          (f32.load offset=8
            (local.get 1))))))

wasm2c (also from WABT) returns this thing: https://paste.linux.community/view/7877995f

[+] Aardappel|6 years ago|reply

wasm2c has a different objective though: to be recompile-able again while preserving semantics. wasm-decompile was designed for readability first.

[+] klodolph|6 years ago|reply

This is fascinating. For various reasons, WASM is less like a target bytecode format and more like a peculiar IR for compilers. I’m sure this has all sorts of effects on the tooling.

[+] k__|6 years ago|reply

What's the difference?

[+] mmastrac|6 years ago|reply

This is super handy. Pseudocode is very useful for understanding flow - so much more than actual assembly. I've always found it an order of magnitude to understand bad asm-to-C decompilation from IDA or Ghidra over perfect disassembly.

[+] dlojudice|6 years ago|reply

> Decompile to what?

> `wasm-decompile` produces output that tries to look like a "very average programming language" while still staying close to the Wasm it represents.

> #1 goal is readability

> #2 goal is to still represent Wasm as 1:1 as possible

It seems AssemblyScript would do the job

[1] https://assemblyscript.org/

[+] Aardappel|6 years ago|reply

AssemblyScript would certainly do worse at #2, and possibly also at #1. To be translate to Wasm or from Wasm lead to different optimal designs, see for example how these two systems deal with loads and stores.

[+] 3pt14159|6 years ago|reply

It would be nice if the decompiled output were runnable through an interpreter so you could step through it with a debugger of some kind and rename or annotate the variables and functions as you reverse engineer what is going on.

[+] Aardappel|6 years ago|reply

I'm the author, if anyone has specific questions :)

[+] 6nf|6 years ago|reply

I notice that your code supports the 'name' custom section as expected, and furthermore you support a few other custom sections too - 'dylink' for example. Where did you find the documentation for these sections? The reason I ask is that I don't believe the official webassembly specs talk about those sections, so I guess they are somewhat compiler specific perhaps?

[+] ellis0n|6 years ago|reply

I'm new in wasm code base. Where is export code generation located? How complex to rewrite export? I want to make export wasm to .acpul programming language for run wasm modules on animation cpu platform. Link to architecture schemas & docs will be a great.

[+] hardwaregeek|6 years ago|reply

Loving the tooling around wasm getting better. I've been debugging my compiler output with hexl-mode and reading the binary format and while it's not that bad, it'd be nice to do more advanced debugging with a text format.

There was a project I saw too that intended to visualize WebAssembly's execution. That'd be extremely helpful too

[+] cfallin|6 years ago|reply

> reading the binary format ... it'd be nice to do more advanced debugging with a text format.

Do you know about `wasm2wat` (from the WebAssembly binary toolkit, "WABT")? It produces a 1-to-1 text representation of the bytecode and is meant to always roundtrip via `wat2wasm` back to the same bytecode.

[+] irrational|6 years ago|reply

When I first started learning JavaScript in the late 90s, the primary way I learned new things was from reading other peoples code in my browser. Nowadays this isn't as easy since you often have to run obfuscated code through a prettifier to get it back into a human readable format, but it is still possible with some effort. I was concerned that WASM would make this impossible (despite the stated goal of "Be readable and debuggable — WebAssembly is a low-level assembly language, but it does have a human-readable text format (the specification for which is still being finalized) that allows code to be written, viewed, and debugged by hand."), but WASM-decompile gives me hope.

https://developer.mozilla.org/en-US/docs/WebAssembly/Concept...

[+] unknown|6 years ago|reply

[deleted]

[+] fowl2|6 years ago|reply

Can we compile it back to wasm again? ;P

[+] frosted-flakes|6 years ago|reply

No.

> Its #1 goal is readability: help guide readers understand what is in a .wasm with as easy to follow code as possible. Its #2 goal is to still represent Wasm as 1:1 as possible, to not lose its utility as a disassembler. Obviously these two goals are not always unifiable.

> This output is not meant to be an actual programming language and there is currently no way to compile it back into Wasm.

[+] saagarjha|6 years ago|reply

Ooh, this is nice! No more having to read wasm2wat’s mildly annoying format.

[+] cjbprime|6 years ago|reply

FWIW there's also wasm2c and wasm2js out there :)

57 comments