top | item 25150738

(no title)

m0sth8 | 5 years ago

Congratulations with the launch.

We've used https://github.com/iterative/dvc for a long time and quite happy. What's the main difference between replicate.ai and dvc?

discuss

bfirsh|5 years ago

Thanks!

DVC is closely tied to Git. We've heard people find that quite heavyweight when you're running experiments.

We think we can build a much better experience if we detach ourselves from Git. With Replicate, you just run your training script as usual, and it automatically tracks everything from within Python. You don't have to run any additional commands to track things.

DVC is really good for storing data sets though, and we see potential for integration there: https://github.com/replicate/replicate/issues/359

gidim|5 years ago

Hey! I'm one of the founders at Comet.ml. We believe that Git should continue to be the approach for managing code (similar to dvc) but we adapted it to the ML workflow. Our approach is to compute a git patch on every run so later you can 'git apply' if you'd like (https://www.comet.ml/docs/user-interface/#the-reproduce-butt...).

shcheklein|5 years ago

Hey, one of the DVC maintainers here!

TL;DR: I think it should be compared with the upcoming DVC feature - https://github.com/iterative/dvc/wiki/Experiments . Stay tuned - it'll be released very soon but you can try it now in beta.

First of all, congrats on the launch! I do really like the aesthetics of the website, and the overall approach. It resonates with our vision and philosophy!

Good feedback on experiments feeling heavyweight! We've been focused on doing great foundation to manage data and pipelines in the previous DVC versions and were aware about this problem (https://github.com/iterative/dvc/issues/2799). As I mentioned - Experiments feature is already there in beta testing. It means that users don't have to do commits anymore until they are ready, still can share experiments (it's a long topic and we'll write a blog post at some point since I really excited about the way it'll be implemented using custom Git refs), support for DL workflow (auto-checkpoints), and more. Would love to discuss and share any details, it would be great to compare the approaches.

mwnivek|5 years ago

I'd be curious about comparison with https://github.com/mlflow/mlflow

bfirsh|5 years ago

We talked to a bunch of MLflow users, and the general impression we got is that it is heavyweight and hard to set up. MLflow is an all-encompassing "ML platform". Which is fine if you need that, but we're trying to just do one thing well. (Imagine if Git called itself a "software platform".)

In terms of features, Replicate points directly at an S3 bucket (so you don't have to run a server and Postgres DB), it saves your training code (for reproducibility and to commit to Git after the fact), and it has a nice API for reading and analyzing your experiments in a notebook.

edolev|5 years ago

Congrats on the launch! This looks exciting. My company has been using Comet.ml and they cover a few use cases that are missing here. Specifically things like real time visualizations and sharing experiments which is key when working in a team. Are you planning on adding those?

fagerhult|5 years ago

Thank you! We have an issue on the roadmap for adding a web GUI: https://github.com/replicate/replicate/issues/295

We haven't thought about it in great detail yet, so I'd be curious to hear your thoughts and ideas if you'd like to add a comment to that issue!