top | item 32419646

(no title)

troiskaer | 3 years ago

Interesting approach on passing data between steps and constructing the overall graph - it will be interesting to see what the take rate is between the two approaches (of sematic and metaflow). On the UI front, Metaflow generates viz for all objects by default in @card; but how does Sematic package up PyTorch referenced in the example (https://docs.sematic.dev/real-example) for execution on the cloud? IIRC, Metaflow packages the cwd (in addition to @conda, @pip etc.) and relies on existing packages for local execution?

Edit: Digging deeper, Sematic relies on Bazel (https://docs.sematic.dev/execution-modes#dependency-packagin...) and needs a BUILD file to specify all the dependencies for cloud execution. It seems that the entire pipeline will execute as a single (or multiple) k8s pod(s) using the same environment?

I am quite interested in trying out Sematic. Any guidelines on what kind of scale Sematic can support today (and the near future)?

discuss

josh-sematic|3 years ago

The way packaging is designed to work in Sematic is for us to hook into your existing dependency management solution to determine what your dependencies are, then build a docker image for you based on those. As you point out, right now we only integrate with bazel for this purpose, but we hope to add more. A simple plugin for requirements.txt -> Docker image is probably next on the TODO list.

> It seems that the entire pipeline will execute as a single (or multiple) k8s pod(s) using the same environment?

single docker image, but multiple pods (when you are using the full cloud mode). This was an intentional decision to avoid confusion around what things could be imported in what places (mimicking more what it would be like in one python instance), and also avoid weird version inconsistencies across the pipeline.

> Any guidelines on what kind of scale Sematic can support today (and the near future)?

Based on some prior tooling experiences, the main bottleneck should be what your Kubernetes cluster can handle.

> I am quite interested in trying out Sematic

Glad to hear it! We'd love to hear about your experiences. You can join our discord if you want help while you're trying it out: https://discord.com/invite/4KZJ6kYVax