top | item 33324626

(no title)

artwr | 3 years ago

> Infrastructure as code is not the norm. Most tools are UI-focused. It's the equivalent of setting up your infra via the AWS UI. > Version Control is not a first class concept

Of course, I may have worked in all of the wrong places but all but one of the places I've worked for the past ten years had source control for data pipelines or the ability to setup via config/source control code as opposed to UIs.

> - Prod/Staging/Dev environments are not the norm

Fairly true, though in some cases, staging/dev has a bit more footprint/investment required than for backend or frontend development.

> DRY and component re-use is exceedingly difficult (how many times did you walk into a meeting where 3 people had 3 different definitions of the same metric?)

That's a hard one and I agree that's where a lot of opportunity is. There are several efforts to get at a more semantic layer / metric catalog where the people who care about the metrics can agree on the definition, but that's more of an organizational issue, not a data engineering issue.

Proper data modeling to ensure you can more easily reuse the metric as needed is also core here.

> - API Interfaces are rarely explicitly defined, and fickle when they are (the hot name for this nowadays is "data contracts")

That's another hard issue. The way I see it, it's still going to be a mix between nicely defined contracts and much looser logging that the DE still has to try to shape into something useful, sometimes even successfully.

> - unit/integration/acceptance testing is not as nearly as ubiquitous as it is in software

I take a slight issue with ubiquitous. The amount of software (from paid vendors no less) I have interacted with which does not have proper acceptance/integration testing is just plain sad.

discuss

order

No comments yet.