I was reading an article earlier today, and it brought me back to a question I’ve heard over and over again in real data/infra teams:
Do we just accept vendor lock-in because it’s convenient,
or do we take the pain and build an open, multi-engine metadata stack?
For context (not my product, just what triggered the thought):
https://medium.com/p/35cc5b15b24e
I’m not trying to argue Gravitino vs. UC here — I’m more interested in the architectural mindset behind these two approaches.
On the vendor-integrated side, the upsides are obvious:
smoother UX
one place for lineage/policies
fewer moving parts
But so are the downsides:
cost keeps creeping up
you end up tied to one engine/format
migrations basically don’t happen in real life
And on the open/composable side:
Spark/Trino/Flink/Ray all first-class
Iceberg/Hudi/Delta can actually coexist
Metadata isn’t tied to compute
But again:
inconsistent metadata models everywhere
no unified governance layer
someone eventually owns a pile of glue code forever
So I’m curious: what actually works in practice?
If your company had to make this choice:
Did you go all-in on a vendor, or build something open?
Did the decision age well after a year or two?
Has anyone actually avoided metadata sprawl without getting locked in?
Where do lineage, ACLs, policies, and the “source of truth” actually live in your setup?
Really interested in what folks think, especially if you're juggling multiple engines, table formats, and clouds.
My take from working on free and opensource Godot Engine and 3d formats metadata is that the main difference is if the people have the knowledge / knowledge transferred of how the process works.
If you lost the knowledge and are substituting a library (vendor) for that knowledge, you have to rewrite that library to understand its gaps and how to update it.
The vendor route feels great at the beginning with clean UX and fewer moving parts, until costs creep up or you suddenly need an engine or table format the platform doesn’t really support.
The open route gives you freedom, but then you’re managing multiple catalogs, inconsistent metadata models, and a bunch of glue code nobody planned for but still ends up living forever.
Gravitino seems to be tackling the “one catalog vs many catalogs” issue.
Where do lineage and ACLs actually live in your setup? I’m genuinely curious how people are handling this today.
wey-gu|3 months ago
iFire|3 months ago
If you lost the knowledge and are substituting a library (vendor) for that knowledge, you have to rewrite that library to understand its gaps and how to update it.
gusye|3 months ago
The vendor route feels great at the beginning with clean UX and fewer moving parts, until costs creep up or you suddenly need an engine or table format the platform doesn’t really support.
The open route gives you freedom, but then you’re managing multiple catalogs, inconsistent metadata models, and a bunch of glue code nobody planned for but still ends up living forever. Gravitino seems to be tackling the “one catalog vs many catalogs” issue.
Where do lineage and ACLs actually live in your setup? I’m genuinely curious how people are handling this today.