top | item 19710984

(no title)

mremes | 6 years ago

I'd just go and write out the technical architecture, defining what are the inputs (the raw data) and what are the outputs (matrices for training, testing etc. etc.) on different intervals (usually, data scientists want the previous days' data processed into some format, A/B test results and such) and how are you going to instrument those transformations. It's not just SQL but the DB where that SQL would be run and orchestration (for example with Apache Airflow), and for concrete ETL tasks (nodes in a processing graph) using a combination of open-source modules (usually in Python) and Bash scripts.

It takes time to get experienced in explaining and mapping these things to the domain.

discuss

No comments yet.