top | item 46960422

(no title)

1 points| TheEdonian | 20 days ago

discuss

order

TheEdonian|20 days ago

__This is a summary of a somewhat long article, it cuts a lot corners due to character limits. Please check the article for more info.__

Some years ago I worked with a scale-up that was really focused on the way they handled data in their product. At some point they started to talk about standardizing their data transfer objects, the data that flows over the API connections, in these common models. The idea was that there would be a single Invoice, User, Customer concept that they can document, standardize and share over their entire application landscape. What they were inventing is now known as a Canonical Data Model. A centralized data model that you reuse for everything. And to be fair to that team, there are companies that make this work. Especially in highly regulated environments you can see this in play for some objects. In banks or medical companies it’s not uncommon to have data contracts that need to encapsulate a ledger or medical checks.

## Bounded context When that team was often talking about domain driven design concepts (value objects, unambiguous language) they seemed to miss the domain part. More specifically, the bounded context. A customer can mean a lot of things to a lot of different people. This is the bounded context. For a sales person a customer is a person that buys things, for a support person they are a person that needs help. They both have different lenses. Now if we keep following the Canonical Data Model, this Customer object will keep on growing. Every week there will be a committee that decides what fields need to be added (you cannot remove fields as that impacts your applications).

## Enter the Data Mesh A way to solve this, is data mesh. This takes the concept of bounded context as a core principle. In the context of this discussion, data mesh sees data as a product. A product that is maintained by the people in the domain. That means that a customer in the Billing domain only maintains and focuses on the Billing domain logic in the customer concept. They are responsible for the quality and contract but not for the representation. That means in practice that they can decide how a VAT number is structured. But not how the Sales team needs to format said model. They have no control or interest in how other domains use the data. It’s a very flexible design but while Data Mesh solves the coupling problem, it introduces a new set of challenges. If I’m an analyst trying to find ‘Customer Revenue,’ do I look in Sales, Billing, or Marketing? The answer is usually ‘all of the above.’ In a pure Mesh, you don’t make multiple calls, you have to build multiple Anti-Corruption Layers just to get a simple report. It requires a high level of architectural maturity and that is something not every low-code or legacy team possesses.

## Federated Hub-and-Spoke Data Strategy Let’s try and see if we can combine these two strategies. We centralize our data in a central lake. Yes, that is back to the CDM setup. But we split it up in federated domains. You have a base Customer table that you call CustomerIdentity that is connected to a SalesCustomer, SupportCustomer, … Think of this as logical inheritance, a ‘CustomerIdentity’ record that is extended by domain-specific tables through a shared primary key. When you create a new Customer in your sales tool you trigger an event. The CustomerCreate event. The CustomerCreate trigger fills out the base information for the Customer (username, firstName, lastName) in the central data lake, at the same time we store our customer (base and domain specific data) in our local database. You also do this for delete and update events. The base information goes to the server, the domain specific data stays on the sales tool as a single source of truth. Every night there is a sync of the domain tools to the central lake to fill out the domain tables with a delta