top | item 22933810

Combining event sourcing and stateful systems

130 points| brendt_gd | 6 years ago |stitcher.io | reply

40 comments

order
[+] evdev|6 years ago|reply
I think to truly be an event-driven architecture you need to go a step or two further and be data-driven.

In other words, the appropriate way to describe your system would not be (subscribable) relationships between a set of components that describe your presumptive view of a division of responsibilities. (This is the non-event driven way of doing things, but with the arrows reversed.)

Instead, you track external input types, put them into a particular stream of events, transform those events to database updates or more events, etc. Your entire system is this graph of event streams and transformations.

These streams may cut across what you thought were the different responsibilities, and you will have either saved yourself headaches or removed a fatal flaw in your design.

If you're thinking about doing work in this area, don't just reverse the arrows in your component design!

[+] dpc_pw|6 years ago|reply
I'm really interested to understand your comment better.

Can you give an example for "presumptive view of a division of responsibilities" and generally the whole comment? Something like "bad way" vs "good way"? Thanks!

[+] Fire-Dragon-DoL|6 years ago|reply
I'm working with an event sourced system and we did some mistakes in the process of the design, so some areas that didn't need event sourcing do have it.

The biggest downside has been the UI: events are not real time and these objects are just CRUD stuff, so the user wants to see that you saved what he has just written. You might not have this information yet, so you need to mitigate it, for example updating the UI through sockets (lot of additional work).

On the upside, we are acquiring a lot more insights in what business processes bring value and are meaningful, versus what I call "just configuration".

We figured out quite a few rules of thumb over time that are helpful though.

One thing I noticed over time is that on average there is no need for "created" and "updated" event, usually there is one meaningful business event that would encompass both (not always the case), e. g. "product listed", or something along those lines. This not only saves lines, but some code reacting to this event has a reduced interaction surface (less bugs and coupling), as well as being more expressive.

If you might be interested, we chat a lot about event sourcing in the Eventide Slack channel: https://eventide-project.org/#community-section

[+] agentultra|6 years ago|reply
This is super important and I cannot stress it enough! If your events contain words like, "create, update, delete, associate, disassociate," then you're building a weak domain model that won't benefit from the added complexity of deriving state from the source of events.

Your events should use the same words your customer would actually use to describe their business process. For example, a system to manage intake of patients in an ER would have events such as Patient Arrived, Patient Screen Completed, Patient Admitted, etc.

If you don't have such a vocabulary then you're not capturing interesting events so don't store them. You probably want something that is event driven instead or perhaps simply to log actions to an audit table.

[+] withinboredom|6 years ago|reply
You should take a look at Microsoft's Durable Functions which pairs event sourcing + (optional) actor model + serverless. It's some pretty neat tech.

I tried doing something similar to this several years ago, and here's a few issues I ran into:

1. Pub/sub in Event Sourcing is a bad idea. It's really hard to get right. (what to do if sub happens after pub due to scaling issues/infrastructure, etc?) Instead it's better to push commands deliberately to a process manager that handles the inter-domain communication and orchestration.

2. Concurrency. Ensuring aggregates are essentially single-threaded entities is a must. Having the same aggregate id running in multiple places can cause some really fun bugs. This usually requires a distributed lock of some sort.

3. Error handling. I ended up never sending a command to a domain directly, instead I sent it to a process manager that could handle all the potential failure cases.

[+] slashdotdash|6 years ago|reply
For anyone interested in event sourcing with the actor model I've built an open source Elixir library called Commanded (https://github.com/commanded/commanded) which takes advantage of Erlang's BEAM VM to host aggregate processes. There's also an event store implemented in Elixir which uses Postgres for storage (https://github.com/commanded/eventstore).

The actor model provides the guarantee that requests to a single instance are processed serially, while requests to different instances can be processed concurrently. Distributed Erlang allows these instances to be scaled out amongst a cluster of nodes with transparent routing of commands to the instance, regardless of which connected node it is running on.

In Elixir and Erlang, the OTP platform provides the building blocks to host an aggregate instance as a process (as a `GenServer`). Following the "functional core, imperative shell" style I model the domain code as pure functions with the host process taking care of any IO, such as reading and appending the aggregate's events.

[+] sbellware|6 years ago|reply
> Pub/sub in Event Sourcing is a bad idea

I find this point surprising. I would say the exact opposite. I would say that pub/sub and event sourcing are two sides of the same coin: events.

> what to do if sub happens after pub

That should only ever be a problem with a non-durable transport that doesn't have serialized writes per topic. Which, admittedly, can be pretty common. But it's not so much an event sourcing or pub/sub issue as much as a choice of message transport issue.

> Concurrency. Ensuring aggregates are essentially single-threaded entities is a must. Having the same aggregate id running in multiple places can cause some really fun bugs. This usually requires a distributed lock of some sort.

Or it requires partitioning the queues and using an optimistic lock when writing (just to be on the safe side).

[+] vorpalhex|6 years ago|reply
On [1] - You're correct that Pub/Sub is difficult to get right, but it can confer a bunch of benefits which make it worthwhile to struggle with. RabbitMQ + Replay-ability (the details of which will differ based on your design) + good data model design is usually a safe bet here.
[+] alextheparrot|6 years ago|reply
Is (2) motivated by using aggregates which do not commute or by trying to do distributed modifications on a single value?

One common technique if you have commutative aggregates is to have each writer just write to their own spot and then do a range query on read to re-join. If your aggregates don’t commute this, of course, doesn’t work and you’re stuck in “single threaded” land. I do remember reading a paper that avoided this, but i can’t remember the implementation / trade offs (If I can find the paper I’ll post it here)

[+] agentultra|6 years ago|reply
I did a formal model of an event source system used in production and it was quite illuminating. It turns out that concurrency is something one should take into account when designing these systems. Versioning often refers to two things:

1. The event data itself; when business cases change or understanding grows we wish to add, remove, or change the type of different fields in an event record.

2. The current state of a projected model

The latter is what requires some form of co-ordination otherwise you can end up with events being written in an incorrect order and produce the wrong state.

It is a good idea though to avoid event sourcing all of your models. Microsoft wrote about their experiences implementing an event-sourced application and how they reached that conclusion [0]. In my experience it's because of temporal properties: event sourced systems are inherently eventually consistent systems. When you have domain models that depend on one another you will need to be quite certain that A eventually leads to B which eventually leads to C and that if a failure happens along the way that nothing is lost or irrecoverable.

[0] https://docs.microsoft.com/en-us/previous-versions/msp-n-p/j...

[+] gen220|6 years ago|reply
We encounter a similar problem at my current job (mixing systems that we want to keep stateful with systems that we want to make “real-time”/stream-based).

I think you’ve covered most of the problems you’ll encounter. One thing that sticks out to me is downtime: how will your order subscriber handle a product publisher that’s down or otherwise delayed? Then, the events will be potentially out of order, is that a problem for you?

On another note, we follow the same bounded context principles, but we implemented it with Kafka+confluent, since that infrastructure and those libraries were already available. Teams make their data accessible via a mix of “raw” Kafka topics and very refined gRPC services. Your subscriber is implemented as a cron job that reads from N stream and “reduces” them to 1 stream.

FWIW, we also store a transaction log in each of our databases, so we can generate a stream of object states relatively easily later on. This has helped a lot with converting old tables into streams, and vice versa.

The only thing that’s a persistent issue is schema changes. My only recommendation there is to never make them... In all seriousness, keep your data models small, and whenever you want to experiment with a schema change, add the new data as a FK’d table with its own transaction log, rather than a schema mutation to your core table. It’s never worth the headache if you take data integrity seriously.

[+] stormageddon|6 years ago|reply
How long did it take you to come up with this approach? How many meetings etc? As a lone developer I always wonder this stuff.
[+] brendt_gd|6 years ago|reply
It took several hours of individual research, watching talks, reading blog posts; and took several pair-programming sessions of several hours over the span of four weeks to come up with a solution we liked.

We informed our client that this was a new area for us and that we didn't have hands-on experience with, but that we believed it would be beneficial to spend time to explore it, as it would be an elegant solution to several of their business problems. They agreed and we kept them in the loop with weekly meetings.

We're now in the phase of actually implementing real-life processes, the project will probably be in active development for another year or two.

[+] carapace|6 years ago|reply

    Events + state = state machine
I was working (briefly) at a startup once and we were having a meeting and the CTO sketched out his idea for our internal architecture and I looked at it and thought, "That's ethernet." I quit that job.

(Technically, I was let go. Friday I went to head of HR and said, "I think I'm gonna quit." Monday morning I was laid off. * shrug *)

[+] jdkoeck|6 years ago|reply
I don't get what this has to do with ethernet.
[+] animeshjain|6 years ago|reply
I would be interested in knowing how the reactors are handling side-effect which should never be replayed. Is there some well established pattern for doing this?
[+] agentultra|6 years ago|reply
Reactors can keep their own state including their current position in the event stream. When a replay is initiated it ignores events older than it's current "head."