top | item 21519014

(no title)

__erik | 6 years ago

What has worked fairly well so far:

Models:

- Models are structured as python packages, each model inherits a base class

- base class has define how to train, and how to predict (as well as a few other more specific things)

- ML engineer can override model serialization methods, default is just pickle

Infra:

- Code is checked in to github, Docker container built each merge into master

- Use Sagemaker BYO container to train models, each job gets a job_id that represents the state produced by that job (code + data used)

Inference / deployment:

- Deploy model as http endpoints (SageMaker or internal) using job_id

- Have a service that centralizes all score requests, finds correct current endpoint for a model, emits score events to kinesis, track health of model endpoints

- A/B test either in scoring service or in product depending on requirements

- deploy prediction jobs using a job_id and a data source (usually sql) that can be configured to output data to S3 or our data warehouse

So far this has been pretty solid for us. The tradeoff has been theres a step between notebook and production for ML engineers which can slow them down, but it forces code review and increases the number of tests checked in.

discuss

aaron-santos|6 years ago

> The tradeoff has been theres a step between notebook and production for ML engineers which can slow them down, but it forces code review and increases the number of tests checked in.

This was a game-changer for us. What does your testing story look like?

__erik|6 years ago

We're still ironing out a few things but unit tests for various functions then we have smol statistically representative datasets for each model. In CI we train a model on the small dataset (aim for <5 mins e2e) then have a a suite of model metrics we care about, tests confirm values are within acceptable bounds and the test values are pulled into the PR.