herobird | 1 month ago | on: What happened to WebAssembly
herobird's comments
herobird | 1 month ago | on: What happened to WebAssembly
Most Wasm proposals are very elegantly designed and effective - meaning they provide lots of value for relatively minor specification bloat. Examples are tail-calls, multi-value, custom-page-sizes, memory64 and even gc.
However, the simd and flexible-simd increased spec bloat by a lot, are not future-proof and caused more fragmentation due to non-determinism. In my opinion work should have focused on flexible-vector (SVE-like) which was more aligned to Wasm's original goals of near-native performance. The reason for this development was that simd was simpler to implement and thus users could reap benefits earlier. Unfortunately, it seems the existence of simd completely stalled development of the superior flexible-vectors proposal.
If flexible-vectors (or similar) will ever be stabilized eventually, we will end up in one of two (bad) scenarios:
1) People will have to decide between simd and flexible-vectors for their compilation, depending on their target hardware which is totally against Wasm's original goals.
2) The simd proposal will be mostly unused and deprecated. Dead weight.
herobird | 2 months ago | on: Is Mozilla trying hard to kill itself?
herobird | 3 months ago | on: Wasmi 1.0 – WebAssembly Interpreter Stable at Last
herobird | 8 months ago | on: Show HN: Munal OS: a graphical experimental OS with WASM sandboxing
herobird | 9 months ago | on: Show HN: Munal OS: a graphical experimental OS with WASM sandboxing
herobird | 9 months ago | on: Show HN: Munal OS: a graphical experimental OS with WASM sandboxing
However, execution is just one metric that might be of importance.
For example, Wasmi's lazy startup time is much better (~100-1000x) since it does not have to produce machine code. This can result in cases where Wasmi is done executing while Wasmtime is still generating machine code.
Old post with some measurements: https://wasmi-labs.github.io/blog/posts/wasmi-v0.32/
Always benchmark and choose the best tool for your usage pattern.
herobird | 9 months ago | on: Show HN: Munal OS: a graphical experimental OS with WASM sandboxing
There is a variety of ways to implement fuel metering with varying trade-offs, e.g. performance, determinism and precision.
In this comment I roughly described how Wasmi implements its fuel metering: https://news.ycombinator.com/item?id=44229953
Wasmi's design focuses on performance and determinism but isn't as precise since instructions are always considered as group.
herobird | 9 months ago | on: Show HN: Munal OS: a graphical experimental OS with WASM sandboxing
herobird | 9 months ago | on: Show HN: Munal OS: a graphical experimental OS with WASM sandboxing
In past experiments I remember that fuel metering adds roughly 5-10% overhead to Wasmi executions. The trick is to not bump or decrease a counter for every single executed instruction but instead to group instructions together in so-called basic blocks and bump a counter for the whole group of instructions.
This is also the approach that is implemented by certain Wasm tools to add fuel metering to an existing Wasm binary.
herobird | 9 months ago | on: Show HN: Munal OS: a graphical experimental OS with WASM sandboxing
Wasmi's fuel metering can be thought of as is there was an adjustable counter and for each instruction that Wasmi executes this counter is decreased by some amount. If it reached 0 the resumable call will yield back to the host (in this case the OS) where it can be decided how to, or if, the call shall be resumed.
For efficiency reasons fuel metering in Wasmi is not implemented as described above but I wanted to provide a simple description.
With this, one is no longer reliant on clocks or on other measures to provide each call its own time frame by providing an amount of fuel for each Wasm app that can be renewed (or not) when it runs out of fuel. So this is useful for building a Wasm scheduler.
herobird | 9 months ago | on: Show HN: Munal OS: a graphical experimental OS with WASM sandboxing
This is really interesting and I was wondering how you implemented that using Wasmi. Seems like the code for that is here:
https://github.com/Askannz/munal-os/blob/2d3d361f67888cb2fe8...
It might interest you that newer versions of Wasmi (v0.45+) extended the resumable function call feature to make it possible to yield upon running out of fuel: https://docs.rs/wasmi/latest/wasmi/struct.TypedFunc.html#met...
Seeing that you are already using Wasmi's fuel metering this might be a more efficient or failure proof approach to execute Wasm apps in steps.
An example for how to do this can be found in Wasmi's own Wast runner: https://github.com/wasmi-labs/wasmi/blob/019806547aae542d148...
herobird | 9 months ago | on: Show HN: Munal OS: a graphical experimental OS with WASM sandboxing
I just watched the demo video of Munal OS and am still in awe of all of its features. Really impressive work!
herobird | 1 year ago | on: Ways to generate SSA
herobird | 1 year ago | on: Wasmi v0.32: WebAssembly interpreter is now faster than ever
Though I have to say that the "list of addresses" approach is not optimal in Rust today since Rust is missing explicit tail calls. Stitch applies some tricks to achieve tail calls in Rust but this has some drawbacks that are discussed in detail at Stitch's README.
Furthermore the "list of addresses" (or also known as threaded code dispatch) has some variance. From what I know both Wasm3 and Stitch use direct threaded code which stores a list of function pointers to instruction handlers and use tail calls or computed-goto to fetch the next instruction. The downside compared to bytecode is that direct threaded code uses more memory and also it is only faster when coupled with computed-goto or tail calls. Otherwise compilers nowadawys are pretty solid in their optimizations for loop-switch constructs and could technically even generate computed-goto-like code.
Thus, due to the lower memory usage, the downsides of using tail calls in Rust and the potential of compiler optimizations with loop-switch constructs we went for the bytecode approach in Wasmi.
herobird | 1 year ago | on: Wasmi v0.32: WebAssembly interpreter is now faster than ever
herobird | 1 year ago | on: Wasmi v0.32: WebAssembly interpreter is now faster than ever
The abundance of Wasm runtimes is a testimony of how great the WebAssembly standard really is!
herobird | 1 year ago | on: Wasmi v0.32: WebAssembly interpreter is now faster than ever
herobird | 1 year ago | on: Wasmi v0.32: WebAssembly interpreter is now faster than ever
when using lazy-unchecked translation with relatively small programs, setting up the Linker sometimes can take up the majority of the overall execution with ~50 host functions (which is a common average number). We are talking about microseconds, but microseconds come to play an importance at these scales. This is why for Wasmi we implemented the LinkerBuilder for a 120x speed-up. :)
herobird | 1 year ago | on: Wasmi v0.32: WebAssembly interpreter is now faster than ever
The non-WASI test cases are only for testing translation performance, thus their imports are not necessary to be satisfied. This would have been the case if the benchmarks tested instantiation performance instead. Usually instantiation is pretty fast though for most Wasm runtimes compared to translation time.
WebAssembly MVP is a good example: it offered limited initial value but was exceptionally simple. Overall, I am happy with how the spec evolved with the exceptions of 128-bit simd and relaxed-simd.
The main issue I see with 128-bit simd is that it was always clear it would not be the final vector extension. Modern hardware already widely supports 256-bit vector widths, with 512-bit becoming more common. Thus, 128-bit simd increasingly delivers only a fraction of native performance rather than the often-cited "near-native" performance. A flexible-vectors design (similar to ARM SVE or the RISC-V vector extension) could have provided a single, future-proof SIMD model and preserved "near-native" performance for much longer.
From a long-term perspective, this feels like a trade-off of short-term value for a large portion of the spec's complexity budget. Though, I may be underestimating the real challenges for JIT implementers, and I am likely biased being the author of a Wasm interpreter where flexible-vectors would be far more beneficial than 128-bit simd.
Why you think flexible-vectors might never have a realistic path to standardization?