Moving from relational data to events

[+] asah|2 years ago|reply

2c: if you need PostgreSQL elsewhere in your app anyway, then store your event data in PostgreSQL + FOSS reporting tools (apache superset, metabase, etc) until you hit ~2TB. After that, decide if you need 2TB online or just need daily/hourly summaries - if so, stick with PostgreSQL forever[1]. I have one client with 10TB+ and 1500 events per sec @ 600 bytes/rec (80GB/day before indexing), 2 days of detail online and the rest summarized and details moved to S3 where they can still query via Athena SQL[2]. They're paying <$2K for everything, including a reporting portal for their clients. AWS RDS multi-AZ with auto-failover (db.m7g.2xlarge) serving both inserts and reporting queries at <2% load. One engineer spends <5 hours per MONTH maintaining everything, in part because the business team builds their own charts/graphs.

Sure, with proprietary tools you get a dozen charts "out of the box" but with pgsql, your data is one place, there's one system to learn, one system to keep online/replicate/backup/restore, one system to secure, one system to scale, one vendor (vendor-equivalent) to manage and millions of engineers who know the system. Building a dozen charts takes an hour in systems like preset or metabase, and non-technical people can do it.

Note: I'm biased, but over 2 decades I've seen databases and reporting systems come & go, and good ol' PostgreSQL just gets better every year.

https://instances.vantage.sh/aws/rds/db.m7g.2xlarge?region=u...

[1] if you really need, there's PostgreSQL-compatible systems for additional scaling: Aurora for another 3-5x scaling, TimescaleDB for 10x, CitusDB for 10x+. With each, there's tradeoffs for being slightly-non-standard and thus I don't recommend using them until you really need.

[2] customer reporting dashboards require sub-second response, which is provided by PostgreSQL queries to indexed summary tables; Athena delivers in 1-2 sec via parallel scans.

[+] move-on-by|2 years ago|reply

I was a team once that strongly considered event sourcing. To me, it seemed like a solution looking for a problem. It could have worked for us, but we ended up passing on it as the benefits were not immediately clear and the risk of doing something new and the lessons learned that would come with it just didn’t seem in the best interest of the project/company. Maybe that makes us tools for passing up a learning opportunity, but I don’t regret getting into that rabbit hole without a fox chasing us down it.

[+] Nextgrid|2 years ago|reply

A boring, conventional system that works is a threat to a bloated engineering team who don't have any work to do & polish their resumes with and might feel at threat of redundancy. That is the "problem" this solution solves.

[+] alecco|2 years ago|reply

Temporal databases make a lot of sense for financial data, for example.

But in most cases you can just have a normal database and store the historic changes in auxiliary tables. So the main database is kind of a materialized view.

[+] devjab|2 years ago|reply

Almost every piece of data we store in SQL would be better on a document database, but since nobody is familiar with those we keep on trucking. I don’t mind too much, I don’t even think we made the wrong choice, but it does cause us some issues with how we have to handle data model changes.

I think most data storage didn’t really keep pace with how a lot of software is being build now though, and things like events and queues are what we build on top of what we have because we need it. For the most part a lot of the data relations happen outside of our databases today through various services, because that’s just how the modern IT landscape looks in many organisations. You’ll have internal master data that supports different teams in the business and interacts with 300+ different IT systems and applications in order to streamline things. With micro services it’s easy to keep the business logic and data models clean, but then you need to manage events, queues and data states as well as reliant storage. Which is just so complicated right now.

I do like SQL but these days, the systems we’re building could frankly be put in a SQLite and be perfectly fine, well almost.

[+] asimpletune|2 years ago|reply

Something that might be missing from these discussions is when event driven architecture is even appropriate. The short answer is if your customer did something and expects a response it's not even driven, that's just request/response.

Event driven is when something happens out of band. E.g. you push your code to GH, which triggered a build. In this example, you reloading the page to see that your updated code is request/response, however that CI build that was enqueued is event driven.

Hope that helps.

[+] berkes|2 years ago|reply

It's not that Simple. Request-response isn't a factor to choose ES or ED architectures on.

You can have request-response, inline, blocking, cycles with ES or ED. And you can have async without ES or ED just fine too (e.g. workers, queues, actors, multithreaded etc)

[+] audnaun252|2 years ago|reply

Modelling domain events is useful for describing the problem your trying to solve with the domain experts, and it should probably be left in the documentation when planning a solution.

For actually implementing a system that provides an audit trail of long-lived state machines, you're probably better off using something like Temporal.io/durable functions which uses event sourcing internally for their persistence, and has a programming model which forces you to think about deduplication/idempotency by adding different constraints for the code that orchestrates the functionality (workflows), vs the code that actually interacts with the real world (activities)

[+] DenisM|2 years ago|reply

Durable functions suffer from lack of Observability tho.

I’d love to hear suggestions on overcoming this issue.

[+] colonwqbang|2 years ago|reply

The concept sounds interesting, but the article doesn't do a great job of explaining how it works. How do I efficiently reconstruct the current state from the event stream? How would the event stream be modelled in the database?

[+] alexzeitler|2 years ago|reply

There are several talks by the author:

https://www.youtube.com/watch?v=gG6DGmYKk4I

https://www.youtube.com/watch?v=jnDchr5eabI

https://www.youtube.com/watch?v=ArcypYS5XBQ

https://www.youtube.com/watch?v=uODSwR2CIV4

He also maintains samples on GitHub:

https://github.com/oskardudycz/EventSourcing.NetCore

https://github.com/oskardudycz/EventSourcing.NodeJS

https://github.com/oskardudycz/EventSourcing.JVM

[+] corethree|2 years ago|reply

Two ways to do it.

1. Use a database designed for this stuff. Google big query, Amazon redshift, clickhouse..etc. all current data is essentially a type of aggregation. Or in other words it's equivalent to a group-by query on an event database.

It makes sense right? With events I can technically rebuild the current state or the past state of the data through some aggregation query.

2. Rename your relational storage and call it a caching layer that lives next to the event system. It's functionally the same thing but won't trigger any red flags in people who are obsessed with making everything event driven.

The architecture he describes exists. It's just massively complicated so services that utilize it usually do very targeted things. Think Google analytics, data dog, splunk... etc. Etc.

[+] nivertech|2 years ago|reply

It's top-down vs bottom-up, or custom vs generic.

Top-down vs bottom-up:

Top-down: starting from the business domain, and then mapping an implementation onto available technologies, tools, and vendors.

Bottom-up: starting from the available technologies, tools, and vendors, and thinking how to bolt up a working solution out of them.

Custom vs generic:

Custom: DDD, CQRS/ES, Sagas, TBUI (Task-based/driven UI), GraphQL, Algebraic Data Types, etc.

Generic: RDBMS, CRUD, REST, ACID transactions, CDC, generic admin UIs, nocode/lowcode, limited/generic types, etc.

[+] tiku|2 years ago|reply

Yeah, ehh I'm just going to stick to good old fashioned relational data.

[+] agentultra|2 years ago|reply

Good, do it until you can’t. Don’t use a hammer on a screw.

[+] ChicagoDave|2 years ago|reply

I'm on board with event-based architectures, but this article struggles to get its point across.

I would focus on the difference between data relations and business behaviors. Once you start thinking in terms of behaviors and business activities, the move away from operational relational data stores becomes much more obvious.

[+] 3abiton|2 years ago|reply

On an abstract level, events can be modelled as relations.

[+] erikpukinskis|2 years ago|reply

Event sourcing has a lot of nice properties, so I’m intrigued. But don’t you still need relations? And then how do you implement those?

If the answer is “they’re all implicit in the application layer code” then that’s not really acceptable. I still need some way to query for relations, or keep relation views up to date, or something like that.

I don’t mind if relations are not core to your persistence model, but they have to be implemented _somewhere_ in your data layer, and I’m not seeing any mention of that here.

I have the same issue with Firestore, everyone does relations _somehow_ but it’s all just spaghetti application code which isn’t scalable.

[+] revskill|2 years ago|reply

No, what you need is a command queue, command event is not domain event.

[+] jstummbillig|2 years ago|reply

I was not aware of event driven design until recently, but coincidentally concluded something like it, after considering the optimal data structure in an AI powered world.

While it's clear how event driven design might have been worth the trade off (assuming you were able manage the complexities and actually made use of the data) being able to query an AI with knowledge of every event that happened to your business, with will make it ubiquitous over the coming years.

[+] ulrischa|2 years ago|reply

I made a php demo for a idea: a event based observer based modelling system. I.e. for game of life: https://github.com/ulrischa/OCell

[+] zabzonk|2 years ago|reply

all the comments are negative, but the post at this time 64 upvotes - why? i've seen this so often on HN, but i really don't understand it.

[+] lolinder|2 years ago|reply

Certain topics get a lot of interest (positive and negative) based on the title alone.

Event-driven isn't quite peak hype any more, but it still gets a lot of instinctive love from a certain group of people, and a lot of instinctive hate from another. So you get a whole bunch of upvotes (but they don't have anything substantial to say about it), then a whole bunch of negative reactions in the comments based on the title alone. And then in this case, you get a whole bunch of negative reactions from people who tried to read the article and couldn't get past the weird tone.

[+] simonbarker87|2 years ago|reply

Hate the article enough to leave a negative comment, want to see the fall out of it and have it not drop off the front page, so stick an upvote on it as well would be my guess. I have commented on this but haven’t upvoted as it clearly a bonkers article.

[+] jupp0r|2 years ago|reply

Because you can't downvote submissions, you can only go in and write a negative comment. Makes complete sense to me to see the effects of that with this article.

[+] freecodyx|2 years ago|reply

Sounds like a junior work

[+] onion-soup|2 years ago|reply

IT is doomed

[+] alecco|2 years ago|reply

This article is not good. Event Sourcing and the Relational Model are orthogonal.

SQL:2011 added a lot of temporal features.

Datomic is based on Datalog which even though is not relational it's kind of the same, and has temporal support. [2]

[1] https://en.wikipedia.org/wiki/SQL:2011

[2] https://vvvvalvalval.github.io/posts/2018-11-12-datomic-even...

(BTW 24 points at the top of HN and no comments? Hmm)

[+] cmrdporcupine|2 years ago|reply

Datalog is definitely relational. More so than SQL.

In terms of temporal data handling & relational & Datalog, it's worth looking at Differential Datalog: https://github.com/vmware/differential-datalog

[+] ithkuil|2 years ago|reply

Datalog is relational in the original sense of relational algebra.

[+] refset|2 years ago|reply

> Datomic [...] has temporal support

Note it only supports "transaction time" (or "system time" per SQL:2011) but not "valid time" (~"event time") which is needed for a bitemporal data model. [1]

[1] https://vvvvalvalval.github.io/posts/2017-07-08-Datomic-this...

[+] hot_gril|2 years ago|reply

You don't need specific temporal support for an event-oriented DB, nor would I want it. I usually design a vanilla relational schema around events regardless of which DBMS I'm using. E.g. instead of an "order" table with multiple states updated in-place, I'd have "order_placed" and "order_filled" where each row is an event, insert-only.

[+] unknown|2 years ago|reply

[deleted]

[+] nomoreusernames|2 years ago|reply

[deleted]

[+] simonbarker87|2 years ago|reply

What on earth is this article trying to accomplish? The tone is bizarre and the underlying concept sounds horrendous to work with if you truly want to replace your static data store with it. By all means add a formal event layer on top of your existing data store but to replace it sounds madness.

If that’s not what the article is proposing then for once I’m going to say it’s not a failure of my intelligence, it’s the articles fault here.

[+] mrkeen|2 years ago|reply

> What on earth is this article trying to accomplish?

Most articles explain building event-driven systems from a greenfield point of view. This article is for when you want to build an event-driven system but you already have brownfield relational data.

[+] AtlasBarfed|2 years ago|reply

Four years ago I heard "Kafka IS your database".

I thought maybe these insane people (probably parroting some tech company enterprise penetration propaganda e.g. Confluent) would have a better story, but... no.

Anyway, yeah, sure, keep logs. But a lot of that article about commands and events is something that only exists if you had one ubiquitous language, system, and OS. You know, the almost literal "seamless" where there aren't any seams.

Sure that will probably plug into some enterprise bus and enterprise integration and enterprise ... anyway.

Competent developers will understand what events to preserve and log and possibly allow retry/repeats.

Anyone who has looked at a Splunk bill will realize that just storing all the logs everywhere is very expensive, which is another way of saying "wasteful". But any generic enterprisey event system will basically start and end with splunk-level log aggregation and kinda-analysis.

[+] alephnan|2 years ago|reply

> What on earth is this article trying to accomplish

Most of these articles are for the author to promote themselves

[+] corethree|2 years ago|reply

Eh. The model he describes is actually standard for analytics.

And because it's standard there are literally databases designed and optimized to do what he says. It's not madness when it already exists and is really common.

Think, redshift, snowflake, biq query, clickhouse..

Additionally their already exists user interfaces and web services that already do what he says.

Datadog, splunk, Google analytics... Anything related to logs, analytics and aggregation of those analytics. What he proposes actually already exists.

That being said I don't agree with the articles point to replace everything with this model. Usually these types of services target very specific use cases.

I think your reaction is a bit extreme here. I don't agree with his proposed model but I see where he's coming from and it's not that the model won't work... It's been proven to work from all the examples I gave above.

The problem with it is that it's just slower and much more complicated. But his proposal does increase the capabilities of your data.

You can increase speed by having a pre-caching layer for your aggregations. Basically what was originally your static store is now a caching layer where the developer or user pre specifies an aggregation that the system should count live as the events come in as well as throwing the events into the event db. If when querying for that aggregation you get a "cache miss" then it hits the event layer and has to do the aggregation job live.

So essentially if you build it like this you have all the capabilities and speed of your original static data store but now you have the ability to re aggregate events differently so you have MORE ways to deal with your data. It can work and it will have more features its just really really really complicated to make an entire system centered around events. Additionally theres also a boatload of extra data to deal with which is another engineering problem.

That's why when people do build these systems it's usually centered around some business requirement that absolutely needs this ability to dynamically query and aggregate events. Logs and analytics being the two big ones. Or some service to data scientists as well.

The theory behind it is attractive. All static data can be represented as a series of events. In fact static data is simply the result of a certain of aggregation query on an event database. It's attractive to use smaller primitives in programming and build higher level abstractions through composition so this style of event driven services seems more fundamental and proper. But of course like I said there's practical issues with it when you look past the theory such that this model is usually only applied to the specific use cases I mentioned above.

So there is a failure here. Not of your intelligence. Failure of your experience.

And as I side note I agree with you on the tone of the article. He's trying to be witty but he's trying too hard.

[+] unknown|2 years ago|reply

[deleted]

[+] candiddevmike|2 years ago|reply

Where's the hitchhikers guide to moving back to relational data after our whiz bang dev made us event driven and left for a new opportunity?

[+] xwowsersx|2 years ago|reply

I was looking forward to reading this based solely on the title, but I find the writing style and tone to be quite unbearable. The forced attempt at being relatable and light-hearted comes across as patronizing and distracts from the intended message or points being conveyed.

[+] hot_gril|2 years ago|reply

I don't take offense to the tone, it's just too much text and too little substance.

[+] unknown|2 years ago|reply

[deleted]

[+] CodeCompost|2 years ago|reply

Didn't read the article but I'm in the process of eradicating Event Sourcing from a codebase and returning to the classical ACID database model. The boneheaded decisions made by our predecessors is staggering and choosing to use Event Sourcing for everything is the dumbest of them all.

[+] lolinder|2 years ago|reply

I worked on a project where we decided that event sourcing was the way to go for a variety of legitimate business needs. We then implemented it with ACID transactions—an application-level framework writes the event to the Postgres database and in the same transaction updates all the computed views. At the scale we were working, this was totally fine performance-wise.

Most people who have had bad experiences with event sourcing were actually having bad experiences with eventual consistency. All that event sourcing means is that you treat the events as the source of truth and everything else as computed from those events (and you could theoretically recompute it all at any time). Eventual consistency is an implementation detail and not a necessary one: you can implement event sourcing in a single Excel file if need be.

[+] capableweb|2 years ago|reply

Ok? Not sure what you want to talk about here, you probably need to give us a bit more context, especially if you even acknowledge you haven't even opened up the article to talk about the submission itself...

Sometimes, the situation when something gets created and designed, looks very different from the current situation you're in N years later. So what might have looked like a boneheaded decision, could have been the best decision at that point.

But us engineers like to lament our predecessors' code, I'm guilty of this sometimes too. But I try to remember that I don't have the full context of how things were when the code was initially written.

[+] politician|2 years ago|reply

What are the major deficiencies of ES in that particular codebase? If you could drop some specifics in bullets that would be really helpful for me. I promise not to ambush you with apologetics.

[+] hot_gril|2 years ago|reply

Just beware, if you're using Postgres or MySQL, it's not fully ACID (specifically "I") unless you run xacts in serializable mode.

[+] unknown|2 years ago|reply

[deleted]

134 comments