(no title)
framebit | 2 years ago
I've described ML Engineering as putting the "science" in Data Science because we help introduce reproducibility. For example, I can take your model training and make it a robust process that happens over a huge amount of data on a daily basis with all the monitoring, logging, and reliability stuff surrounding that.
Some topics I would personally want to see for an ML Engineer on my team (and again, "ML Engineer" has less of a solid definition across the industry than "frontend engineer" or other roles that have been around longer) - Docker: can you containerize your code? Can you interact with a local container? - Model serving: at a basic level, can you wrap an API around a model? There's lots more systems design stuff here if you want to go deeper on model serving platforms. - CI/CD: do you know what Jenkins does? (Or equivalent) Can you talk about a coherent code testing strategy for ML code? How would you deploy a model service using a system like Jenkins? - Cloud stuff: you don't need to be an expert, but can you interact with cloud APIs directly or through Terraform, spin up instances, know the difference between object storage and databases, and do you have some Kubernetes experience (run a pod, get the logs, take some debugging steps when something's wrong). - Modern MLOps: model registry systems like MLFlow, feature stores (DIY preferred but vendors ok) - Scheduling and Pipelining: Airflow, Vertex Pipelines, lots of options here but those are the biggies. Know how to use these for basic data pipelines, model training, service deployment, and why and how you can deploy these via CD - Monitoring: know the difference and have strategies around monitoring systems metrics (cpu usage, etc) and model metrics (data drift, etc)
A lot of this stuff is harder to learn on your own because it often comes up in the context of larger teams and enterprise scale, where monitoring and reliability turn into KPIs that execs look at, but this is, to me, the stuff that defines the difference between a Data Scientists and an ML Engineer.
varane|2 years ago