herobird's comments

herobird | 1 month ago | on: What happened to WebAssembly

My view on specifications is that their long-term success depends on the value they provide relative to their complexity. Complexity inevitably grows over time, so spending that complexity budget carefully is crucial, especially since a specification is only useful if it remains implementable by a broad set of engines.

WebAssembly MVP is a good example: it offered limited initial value but was exceptionally simple. Overall, I am happy with how the spec evolved with the exceptions of 128-bit simd and relaxed-simd.

The main issue I see with 128-bit simd is that it was always clear it would not be the final vector extension. Modern hardware already widely supports 256-bit vector widths, with 512-bit becoming more common. Thus, 128-bit simd increasingly delivers only a fraction of native performance rather than the often-cited "near-native" performance. A flexible-vectors design (similar to ARM SVE or the RISC-V vector extension) could have provided a single, future-proof SIMD model and preserved "near-native" performance for much longer.

From a long-term perspective, this feels like a trade-off of short-term value for a large portion of the spec's complexity budget. Though, I may be underestimating the real challenges for JIT implementers, and I am likely biased being the author of a Wasm interpreter where flexible-vectors would be far more beneficial than 128-bit simd.

Why you think flexible-vectors might never have a realistic path to standardization?

herobird | 1 month ago | on: What happened to WebAssembly

> There is a lot of desire for advancement, but standardization means decisions are hard to reverse. For many, things are moving too quickly and in the wrong direction.

Most Wasm proposals are very elegantly designed and effective - meaning they provide lots of value for relatively minor specification bloat. Examples are tail-calls, multi-value, custom-page-sizes, memory64 and even gc.

However, the simd and flexible-simd increased spec bloat by a lot, are not future-proof and caused more fragmentation due to non-determinism. In my opinion work should have focused on flexible-vector (SVE-like) which was more aligned to Wasm's original goals of near-native performance. The reason for this development was that simd was simpler to implement and thus users could reap benefits earlier. Unfortunately, it seems the existence of simd completely stalled development of the superior flexible-vectors proposal.

If flexible-vectors (or similar) will ever be stabilized eventually, we will end up in one of two (bad) scenarios:

1) People will have to decide between simd and flexible-vectors for their compilation, depending on their target hardware which is totally against Wasm's original goals.

2) The simd proposal will be mostly unused and deprecated. Dead weight.

herobird | 2 months ago | on: Is Mozilla trying hard to kill itself?

It's kinda frustrating that Mozilla's CEO thinks that axing ad-blockers would be financially beneficial for them. Quite the opposite is true (I believe) since a ton of users would leave Firefox for alternatives.

herobird | 9 months ago | on: Show HN: Munal OS: a graphical experimental OS with WASM sandboxing

Wasmtime, being an optimizing JIT, usually is ~10 times faster than Wasmi during execution.

However, execution is just one metric that might be of importance.

For example, Wasmi's lazy startup time is much better (~100-1000x) since it does not have to produce machine code. This can result in cases where Wasmi is done executing while Wasmtime is still generating machine code.

Old post with some measurements: https://wasmi-labs.github.io/blog/posts/wasmi-v0.32/

Always benchmark and choose the best tool for your usage pattern.

herobird | 9 months ago | on: Show HN: Munal OS: a graphical experimental OS with WASM sandboxing

Yes, rational is to provide a pragmatic and efficient solution to infinite loops.

There is a variety of ways to implement fuel metering with varying trade-offs, e.g. performance, determinism and precision.

In this comment I roughly described how Wasmi implements its fuel metering: https://news.ycombinator.com/item?id=44229953

Wasmi's design focuses on performance and determinism but isn't as precise since instructions are always considered as group.

herobird | 9 months ago | on: Show HN: Munal OS: a graphical experimental OS with WASM sandboxing

I don't know how fuel metering in Wasmtime works and what its overhead is but keep in mind that Wasmi is an interpreter based Wasm runtime whereas Wasmtime generates machine code (JIT).

In past experiments I remember that fuel metering adds roughly 5-10% overhead to Wasmi executions. The trick is to not bump or decrease a counter for every single executed instruction but instead to group instructions together in so-called basic blocks and bump a counter for the whole group of instructions.

This is also the approach that is implemented by certain Wasm tools to add fuel metering to an existing Wasm binary.

herobird | 9 months ago | on: Show HN: Munal OS: a graphical experimental OS with WASM sandboxing

Great question!

Wasmi's fuel metering can be thought of as is there was an adjustable counter and for each instruction that Wasmi executes this counter is decreased by some amount. If it reached 0 the resumable call will yield back to the host (in this case the OS) where it can be decided how to, or if, the call shall be resumed.

For efficiency reasons fuel metering in Wasmi is not implemented as described above but I wanted to provide a simple description.

With this, one is no longer reliant on clocks or on other measures to provide each call its own time frame by providing an amount of fuel for each Wasm app that can be renewed (or not) when it runs out of fuel. So this is useful for building a Wasm scheduler.

herobird | 9 months ago | on: Show HN: Munal OS: a graphical experimental OS with WASM sandboxing

> Every iteration of the loop polls the network and input drivers, draws the desktop interface, runs one step of each active WASM application, and flushes the GPU framebuffer.

This is really interesting and I was wondering how you implemented that using Wasmi. Seems like the code for that is here:

https://github.com/Askannz/munal-os/blob/2d3d361f67888cb2fe8...

It might interest you that newer versions of Wasmi (v0.45+) extended the resumable function call feature to make it possible to yield upon running out of fuel: https://docs.rs/wasmi/latest/wasmi/struct.TypedFunc.html#met...

Seeing that you are already using Wasmi's fuel metering this might be a more efficient or failure proof approach to execute Wasm apps in steps.

An example for how to do this can be found in Wasmi's own Wast runner: https://github.com/wasmi-labs/wasmi/blob/019806547aae542d148...

herobird | 1 year ago | on: Ways to generate SSA

Have you had experience implementing SSA via sea-of-nodes representation? Could it be that in this case dominance frontier is no longer important and one could use the simpler SSA construction algorithms that do not require dominance frontiers?

herobird | 1 year ago | on: Wasmi v0.32: WebAssembly interpreter is now faster than ever

Yes this iterative process is indeed very visible. Wasmi started out as a mostly safe Rust interpreter and over time went more and more into a performance oriented direction.

Though I have to say that the "list of addresses" approach is not optimal in Rust today since Rust is missing explicit tail calls. Stitch applies some tricks to achieve tail calls in Rust but this has some drawbacks that are discussed in detail at Stitch's README.

Furthermore the "list of addresses" (or also known as threaded code dispatch) has some variance. From what I know both Wasm3 and Stitch use direct threaded code which stores a list of function pointers to instruction handlers and use tail calls or computed-goto to fetch the next instruction. The downside compared to bytecode is that direct threaded code uses more memory and also it is only faster when coupled with computed-goto or tail calls. Otherwise compilers nowadawys are pretty solid in their optimizations for loop-switch constructs and could technically even generate computed-goto-like code.

Thus, due to the lower memory usage, the downsides of using tail calls in Rust and the potential of compiler optimizations with loop-switch constructs we went for the bytecode approach in Wasmi.

herobird | 1 year ago | on: Wasmi v0.32: WebAssembly interpreter is now faster than ever

No I do not but it is a very interesting question and probably not even answerable in practice because not every instruction takes the same amount of time to execute to completion. An outlier in this regard are for example host function calls which could do arbitrary things on the host side or bulk-memory operations which scale linearly with their inputs etc.

herobird | 1 year ago | on: Wasmi v0.32: WebAssembly interpreter is now faster than ever

Thank you! :)

when using lazy-unchecked translation with relatively small programs, setting up the Linker sometimes can take up the majority of the overall execution with ~50 host functions (which is a common average number). We are talking about microseconds, but microseconds come to play an importance at these scales. This is why for Wasmi we implemented the LinkerBuilder for a 120x speed-up. :)

herobird | 1 year ago | on: Wasmi v0.32: WebAssembly interpreter is now faster than ever

I am aware of Wizard and I think it is a pretty interesting Wasm runtime. It would be really cool if it was part of Wasmi's benchmark testsuite (https://github.com/wasmi-labs/wasmi-benchmarks). Contributions to add more Wasm runtimes and more test cases are very welcome.

The non-WASI test cases are only for testing translation performance, thus their imports are not necessary to be satisfied. This would have been the case if the benchmarks tested instantiation performance instead. Usually instantiation is pretty fast though for most Wasm runtimes compared to translation time.

page 1