The biggest motivator for me is that WASM sandbox provides true deterministic execution. Contrary to engines like temporal, using hashmaps is 100% deterministic here. Attempting to spawn a thread is a compile error.
It also performs well - the bottleneck is in the write throughput of sqlite.
Last but not least - all the interfaces between workflows and activities are type safe, described in a WIT schema.
AlotOfReading|10 months ago
It's certainly close enough that calling it deterministic isn't misleading (though I'd stop short of "true determinism"), but there's still sharp edges here with things like hashmaps (e.g. by recompiling: https://dev.to/gnunicorn/hunting-down-a-non-determinism-bug-...).
tomasol|10 months ago
Although I don't expect to be an issue practically speaking, Obelisk checks that the replay is deterministic and fails the workflow when an unexpected event is triggered. It should be also be possible to add an automatic replay of each finished execution to verify the determinism e.g. while testing.
[1] https://docs.rs/wasmtime/latest/wasmtime/struct.Config.html#...
Edit: Enabling the flags here: https://github.com/obeli-sk/obelisk/pull/67
tomasol|10 months ago
All responses and completed delays are stored in a table with an auto-incremented id, so the `-await-next` will always resolve to the same value.
As you mention, putting a persistent sleep and a child execution into the same join set is not yet implemented.
genuine_smiles|10 months ago
Which circumstances?
jcmfernandes|10 months ago
So, I like this idea, I really do. At the same time, in the short-term, WASM is relatively messy and, in my opinion, immature (as an ecosystem) for prime time. But with that out of the way (it will eventually come), you'll have to tell people that they can't use any code that relies on threads, so they better know if any of the libraries they use does it. How do you foresee navigating this? Runtime errors suck, especially in this context, as fixing them requires either live patching code or migrating execution logs to new code versions.
tomasol|10 months ago
There is always this chicken and egg problem on a new platform, but I am hoping that LLMs can solve it partially - the activities are just HTTP clients with no complex logic.
Regarding the restrictions required for determinism, they only apply to workflows, not activities. Workflows should be describing just the business logic. All the complexities of retries, failure recovery, replay after server crash etc. are handled by the runtime. The WASM sandbox makes it impossible to introduce non-determinism - it would cause a compile error so no need for runtime checks.