(no title)
adrianbr | 2 years ago
Our experience comes from startups that usually do not have time to track down the knowledge and rather go out and find/make their own. Here you definitely want evolution with alerts before curation - so load to raw, and curate from there. Picking out data out of something without a schema is called "schema on read" and you can read about its shortcomings. So this is both robust and practical.
For the fine tuning, as I mentioned, data contracts are a PR review and some tweaks away. They will be highly configurable between strict, rule based evolution, or free evolution. Definitely use alerts for curation of evolution events!
nerdponx|2 years ago
Have you considered a hybrid solution, something that generates a contract from a large corpus of data, which can then be deployed statically?
I consider "responding to change" as a somewhat different scenario from "heterogeneous but not changing". So statically generating a contract from an existing corpus supports the latter.
I could also envision some kind of graceful degradation, where you have a static contract, but you have dynamic adjustments instead of outright failures if the data does not conform to that contract.
vasa_|2 years ago