top | item 46059242

Vendor lock-in vs. open metadata architecture? What works?

2 points| wey-gu | 3 months ago |medium.com

5 comments

wey-gu|3 months ago

I was reading an article earlier today, and it brought me back to a question I’ve heard over and over again in real data/infra teams: Do we just accept vendor lock-in because it’s convenient, or do we take the pain and build an open, multi-engine metadata stack? For context (not my product, just what triggered the thought): https://medium.com/p/35cc5b15b24e I’m not trying to argue Gravitino vs. UC here — I’m more interested in the architectural mindset behind these two approaches. On the vendor-integrated side, the upsides are obvious: smoother UX one place for lineage/policies fewer moving parts But so are the downsides: cost keeps creeping up you end up tied to one engine/format migrations basically don’t happen in real life And on the open/composable side: Spark/Trino/Flink/Ray all first-class Iceberg/Hudi/Delta can actually coexist Metadata isn’t tied to compute But again: inconsistent metadata models everywhere no unified governance layer someone eventually owns a pile of glue code forever So I’m curious: what actually works in practice? If your company had to make this choice: Did you go all-in on a vendor, or build something open? Did the decision age well after a year or two? Has anyone actually avoided metadata sprawl without getting locked in? Where do lineage, ACLs, policies, and the “source of truth” actually live in your setup? Really interested in what folks think, especially if you're juggling multiple engines, table formats, and clouds.

iFire|3 months ago

My take from working on free and opensource Godot Engine and 3d formats metadata is that the main difference is if the people have the knowledge / knowledge transferred of how the process works.

If you lost the knowledge and are substituting a library (vendor) for that knowledge, you have to rewrite that library to understand its gaps and how to update it.

gusye|3 months ago

I’ve seen teams struggle on both sides of this.

The vendor route feels great at the beginning with clean UX and fewer moving parts, until costs creep up or you suddenly need an engine or table format the platform doesn’t really support.

The open route gives you freedom, but then you’re managing multiple catalogs, inconsistent metadata models, and a bunch of glue code nobody planned for but still ends up living forever. Gravitino seems to be tackling the “one catalog vs many catalogs” issue.

Where do lineage and ACLs actually live in your setup? I’m genuinely curious how people are handling this today.