(no title)
m0sth8 | 5 years ago
We've used https://github.com/iterative/dvc for a long time and quite happy. What's the main difference between replicate.ai and dvc?
m0sth8 | 5 years ago
We've used https://github.com/iterative/dvc for a long time and quite happy. What's the main difference between replicate.ai and dvc?
bfirsh|5 years ago
DVC is closely tied to Git. We've heard people find that quite heavyweight when you're running experiments.
We think we can build a much better experience if we detach ourselves from Git. With Replicate, you just run your training script as usual, and it automatically tracks everything from within Python. You don't have to run any additional commands to track things.
DVC is really good for storing data sets though, and we see potential for integration there: https://github.com/replicate/replicate/issues/359
gidim|5 years ago
shcheklein|5 years ago
TL;DR: I think it should be compared with the upcoming DVC feature - https://github.com/iterative/dvc/wiki/Experiments . Stay tuned - it'll be released very soon but you can try it now in beta.
First of all, congrats on the launch! I do really like the aesthetics of the website, and the overall approach. It resonates with our vision and philosophy!
Good feedback on experiments feeling heavyweight! We've been focused on doing great foundation to manage data and pipelines in the previous DVC versions and were aware about this problem (https://github.com/iterative/dvc/issues/2799). As I mentioned - Experiments feature is already there in beta testing. It means that users don't have to do commits anymore until they are ready, still can share experiments (it's a long topic and we'll write a blog post at some point since I really excited about the way it'll be implemented using custom Git refs), support for DL workflow (auto-checkpoints), and more. Would love to discuss and share any details, it would be great to compare the approaches.
mwnivek|5 years ago
bfirsh|5 years ago
In terms of features, Replicate points directly at an S3 bucket (so you don't have to run a server and Postgres DB), it saves your training code (for reproducibility and to commit to Git after the fact), and it has a nice API for reading and analyzing your experiments in a notebook.
edolev|5 years ago
fagerhult|5 years ago
We haven't thought about it in great detail yet, so I'd be curious to hear your thoughts and ideas if you'd like to add a comment to that issue!