top | item 42918402

(no title)

I see it being a trade-off between how explicit the state persisted for a workflow execution is (rows in a database for Temporal and DBOS) vs how natural it is to write such a workflow (like in your PL/compiler). Given workflows are primarily used for business use-cases, with a lot of non-determinacy coming from interaction with third-party services or other deployments, the library implementation feels more appropriate.

Though I am assuming building durability at a language-level means the whole program state must be serializable, which sounds tricky. Curious if you could share more?

discuss

peterkelly|1 year ago

There's certainly a tradeoff between the two approaches; a simpler representation (list of tasks or DAG) is easier to query and manipulate, at the cost of being less expressive, lacking features like loops, conditionals, etc.

In the workflow engine I described, state is represented as a graph of objects in memory; this includes values like integers/strings and data structures like dictionaries/lists, as well as closures, environments, and the execution stack. This graph is serialised as JSON and stored in a postgres table. A more compact binary representation could be added in the future if performance requirements demand it, but JSON has been sufficient for our needs so far. A delta between each snapshot is also stored in an execution log, so that the complete execution history is stored for auditing purposes.

The interpreter is written in such a way that all object allocation, object manipulation, and garbage collection is under its control, and all the data needed to represent execution state is stored in a manner that can be easily serialised. In particular, we avoid the use of pointers to memory locations, instead using object ids for all references. So the persistent state, when loaded, can be accessed directly, since any time a reference from one object to another needs to be followed, the interpreter does so by looking up the object in the heap based on its id.

Non-deterministic and blocking operations (including IPC receives) are handled outside of the evaluation cycle. This enables their results to be explicitly captured in the execution log, and allows for retries to be handled by an external mechanism under control of the user (since retrying can be unsafe if the operation is not idempotent).

The biggest win of using a proper language for expressing the workflow is the ability to add arbitrary logic between blocking operations, such as conditional tests or data structure manipulation. Any kind of logic you might want to do can be expressed due to the fact the workflow language is Turing-complete.

KraftyOne|1 year ago

That's really interesting! It does seem that this is identically semantically to the library approach (as the logic your interpreter adds around steps could also be added by decorators) but is completely automatic. Which is great if the interpreter always does the right thing, but problematic/overly magical if the interpreter doesn't. For example, if your problem domain has two blocking operations that really form one single step and should be retried together, a library approach lets you express that but an interpreted approach might get it wrong.