top | item 37422870

(no title)

smif | 2 years ago

> Re: ORMs, I respectfully disagree. I’ve come across many teams that treat their Python/Rust/Go codebase with ownership and craft, I have not seen the same be said about SQL queries. It’s almost like a 'tragedy of the commons’ problem - columns keep getting added, logic gets patched, more CTEs to abstract things out but in the end adds to the obfuscation.

> ORMs don’t fix everything but it does help constraint the ‘degrees of freedom’ and help keeps logic repeatable and consistent, and generally better than writing your own string-manipulation functions. An idea I had I continued (I wrote the post early last year) was to use static analysis tools like Meta’s UPM to allow refactoring of tables / DAGs (keep interfaces the same but ‘flatter’ DAGs, less duplicate transforms).

I get what you're saying, but think about a large org with a lot of different teams and heterogenous data stores - it's gonna be pretty hard to implement a top-down directive to tell everyone to use such and such ORM library, or to ensure a common level of ownership and craft. This is where SQL is the lingua franca and usually the native language of the data stores themselves and is a common factor between most/all of them. This is also where tools like Trino / PrestoSQL can come in and provide a compatibility layer at the SQL level while also providing really nice features such as being able to do joins across different kinds of data stores / query optimization / caching / access control / compute resource allocation.

In general it's hard to get things to flow "top down" in larger orgs, so it's better to address as much as you can from the bottom up. This includes things like domain models - it's gonna be tough to get everyone to accept a single domain model because different teams have different levels of focus and granularity as they zoom into specific subsets so they will tend to interpret the data in their own ways. That's not to say any of them are wrong, there's a reason why that whole data lake concept of "store raw unstructured data" came in where the consumer enforces a schema on read. This gives them the power to look at the data from their own perspective and interpretation. The more interpretation and assumptions you bake into the data before it reaches the consumers, the more problems you tend to run into.

That's not to say that you can't have a shared domain model between different teams. There are unsurprisingly also products out there that provide the enterprise the capability to collaboratively define and refine shared domain models, which can then be used as a lens/schema to look at the data. Crucially the domain model may shift over time, so this decoupling of the domain model from the actual schema of the stored data allows for the domain model to evolve over time without having to go back and fix the stored data because we have not baked in any assumptions or interpretations into the stored data itself.

discuss

No comments yet.