top | item 17250873

(no title)

saamm | 7 years ago

This is interesting! It sounds like this v1 gets your local environment up and running in a Docker container. I maintain something similar for analysts on my team, and we've seen success in terms of decreasing time spent on environment setup.

As another interesting use of Docker in the data space, I'm excited about Pachyderm [0] (though I haven't had the chance to use it in production). In particular, the data provenance story seems compelling.

0: https://github.com/pachyderm/pachyderm

discuss

order

jdoliner|7 years ago

Thanks for the plug saamm, I'm one of the creators of Pachyderm. I think Torus and Pachyderm would work very nicely together. You could go straight from developing code in the image Torus provides to deploying it on Pachyderm as a production pipeline that runs on new data as it comes in with just a few commands. Similarly, their Dockerized data science cookie-cutter could work nicely as a Pachyderm service, this would work similar to using the service on your laptop, except that you could easily deploy it on a cloud provider and schedule it with GPUs and it will get updated with new data as it comes in.

Very exciting to see more people applying containers to data science.

sdeymanifold|7 years ago

Yes to containers! We are trying to make it as seamless as possible to be Docker first in all things. And not reinvent the devops wheel. It just needs to be adapted for the needs ot data scientists. Pachyderm is really cool. I will have to check it out. We've recently moved to Airflow for all our pipeline management... how does Pachyderm fit in that ecosystem?