top | item 14782938

Ask HN: How do you version control your neural nets?

42 points| mlejva | 8 years ago

When I started working with neural nets I instinctively started using git. Soon I realised that git isn't working for me. Working with neural nets seems way more empirical than working with a 'regular' project where you have a very specific feature (e.g. login feature): you create a branch where you implement this feature. Once the feature is implemented you merge with your develop branch and you can move to another feature.

The same approach doesn't work with neural nets for me. There's 'only' one feature you want to implement - you want your neural net to generalise better/generate better images/etc (depends on the type of problem you are solving). This is very abstract though. One often doesn't even know what's the solution until you empirically try to tweak several hyper parameters and see the loss function and accuracy. This makes the branch model impossible to use I think. Consider this: you create a branch where you want to use convolutional layers for example. Then you find out that your neural net is performing worse. What should you do know? You can't merge this branch to your develop branch since it's a basically 'dead end' branch. On the other hand when you delete this branch you lose information that you've already tried this model of your net. This also produce huge amount of branches since you have enormous number of combinations for your model (e.g. convolutional layers may yield better accuracy when used with different loss function).

I've ended up with a single branch and a text file where I manually log all models I have tried so far and their performance. This creates nontrivial overhead though.

13 comments

[+] btown|8 years ago|reply

If your neural net config is in a relatively standalone file, or you can mark it with a special comment block, you could have your test runner actually read the source file, regex it out, and concat the source block, date, current git SHA, and performance metrics into a "neural_runs.txt" file. If something else about your data pipeline is changing as well, e.g. filter settings on your image preprocessing, you can throw that in there too.

If you check this in, then every commit will include the diff of everything you tried to get there alongside the final source file, and additionally that file will serve as a single historical record for everything you tried for all time. Asking yourself a month later "did I ever try cross entropy" is as easy as grepping the file.

Heck, you could insert into a database as well if you really wanted to, and visualize your performance changes over time a la http://isfiberreadyyet.com/ . Sky's the limit.

[+] kixiQu|8 years ago|reply

I am very interested to see what people's answers are for this, because I pine for a version control system designed for the twists and turns of experimental investigation rather than the needs of engineering implementation. I very much suspect that some sort of structured approach to one's commit messages might be key, and a careful mapping of DAG concepts to experimental ones--branching as the modification of an independent variable, with a base commit selected as the control point of comparison? Would one want to be able to rebase in order to compare against a different point? What would the semantics of merges represent?

[+] cityhall|8 years ago|reply

I've been trying to do this better recently after having some non-reproducible results. I've settled on taking all hyperparameters (including booleans like whether to use batch norm) from a global dict. Instead of commenting and uncommenting lines, I look up a key with a default value, adding the default to the dict if it wasn't there. Then I print and log the dict with the results.

I end up with a bunch of code like:

    if get_param('use_convnet_for_thing1', True):
        convnet1_params = get_param('convnet1_params', None)
        thing1 = build_convnet(thing1_input, convnet1_params)
    elif ...

By logging the hyperparameter dict, source checkpoint, and rand seed, results should be reproducible.

This works well for rapid iteration like in jupyter notebooks. For models that take days to train, you might as well use source control for your scripts.

[+] dwhitena|8 years ago|reply

Great questions and discussions. I'm definitely passionate about versioning in the context of models and data science for both data and code. I work full time on the open source Pachyderm project (pachyderm.io), and we have users versioning their data and models in our system. Basically, you can output checkpoints, weights, etc. from your modeling and have that data versioned automatically in Pachyderm. Then if you utilize that persisted model in a data pipeline for inference, you can have total provenance over which versions of which models created which results (and which training data was used to create that version of the model, etc.).

[+] taroth|8 years ago|reply

Shameless plug: https://hyperdash.io

I got tired of maintaining one-off scripts to do recording, so I started working with friends on a dedicated solution. Today it lets you stream logs via a small Python library, then view individual training runs on an iOS/Android app. Takes less than a minute to get setup.

We're planning on expanding to model versioning in the next few weeks. Interesting to see how others are thinking about it. If you have model versioning thoughts you dont feel like posting here, drop me a note at [email protected]

[+] agitator|8 years ago|reply

Maybe write a shell macro to pull accuracy and error into the commit message along with your comment on the changes. You could also add some automation to automatically branch if your test results are worse than before, so if you hit a dead end on that branch and realize the experiment didn't go well down the line, you can head back to where you branched, or if the end result works, you can merge back into your starting branch.

[+] rpedela|8 years ago|reply

Is there any value to the code in failed attempts or do you just want a log of things you have tried?

If the former, you could try a single experiment branch and use tags to denote different experiments. Add a tag when you finish an experiment then overwrite with your changes for next experiment and repeat. This would keep all the changes while not have having a huge number of dead branches and the branch could be merged when necessary.

If the latter, why not an experiment log that is checked in which has a similar form to a change log? Or maybe create an issue and branch for each experiment then update the issue with results and delete the branch?

[+] p1esk|8 years ago|reply

Why would you manually log your models? In my NN experiments, I automatically write the list of all hyperparameter values and the corresponding performance to a file. In addition, I automatically generate and save graphs showing the results, typically one graph per a nested 'for' loop.

[+] kungito|8 years ago|reply

Why is it so bad if you have many branches?

[+] mlejva|8 years ago|reply

It has several disadvantages: (1) it creates a nontrivial overhead for your workflow. You're basically creating git branches every few lines of code (I guess part of it could be somehow automated). (2) it kind of feels like an overkill to create a whole new branch just to for example change a single variable. (3) a lot of those branches would remain "unfinished". By that I mean they would simply exist just to inform you that you have tried this model of neural net. You couldn't merge them to anything since those changes would make your net perform worse. (4) if you would want to see the code of some specific model you had tried before you would need to always switch to a different branch. This creates a bulky workflow.

[+] andbberger|8 years ago|reply

Not just for neural nets - balancing experimentation against building reusable tools is probably the biggest logistical challenge in scientific programming in general.

I've converged to a workflow where I maintain a library with a main project pipeline and reusable tools for the project, and do all scripting with jupyter (all notebooks version controlled).

I've found that machine learning projects can be pretty effectively parametrized with config dicts for data, training and the model. Each type of config gets it's own pipelined method that does all of the library calls - pipeline_batch_gen, pipeline_train, pipeline_build_model.

Example of a poorly organized config from a project:

model_config = { 'optimizer': optimizer, 'clip_grad': clip_grads, 'name': model_name, 'residual': residual, 'n_conv_filters': n_conv_filters, 'n_output_hus': n_output_hus, 'activation': activation, 'batch_norm': batch_norm, 'output_bn': output_bn, 'generation': generation, 'data_spec': { 'uniform_frac': uniform_frac, 'include_augment': True, 'batch_size': batch_size, 'bulk_chunk_size' : bulk_chunk_size, 'max_bulk_chunk_size': max_bulk_chunk_size, 'loss_weighter': loss_weight }, 'train_spec': { 'early_stopping_patience': early_stopping_patience, 'lr_plateau_patience': lr_plateau_patience, 'learning_rate': init_lr, 'clip_grads': clip_grads, 'partial_weight': partial_weight } }

I've wanted to give Sacred a try https://github.com/IDSIA/sacred - looks promising but haven't tried yet so can't comment.

I still tend to keep track of model performance by hand though. But I have always have the notebooks I can go back to for reference. This is something sacred could help a lot with.

Another very non-trivial aspect of this kind of work is the compute/storage infrastructure you need to scale beyond a single workstation.

We have a nice system here where $HOME lives on NFS and gets mounted when you log into any machine on the network - I can hardcode paths in my code and count on every worker having the same filesystem. I can't imagine how we would do distributed jobs without NFS. That's not a very realistic solution for homegamers though - you need a very fast network and expensive commodity hardware. And sys admins.

Does anyone have a solution for that half of the problem? I've seen a number of merkle-tree based data version control solutions recently...