top | item 44883281

Why Metaflow?

15 points| savin-goyal | 6 months ago |docs.metaflow.org

6 comments

order

thomasingalls|6 months ago

What do people do to curate/version /transform their raw datasets these days? I am vaguely aware of the "chuck it all into s3" strategy for hanging onto raw data, and related strategies where instead of s3 it's a db of some flavor. What are folks doing for record-keeping for what today's raw data contains vs tomorrow's?

And the next step - a curated dataset has a time-bound provenance - what are folks doing to keep track of the transformations/cleaning steps that makes the raw data useful for the data at the time it's being processed? Does this bit fall under the purview of metaflow, or is this different tooling?

Or maybe my assumptions are off base! Curious about what other teams are doing with their datasets.

patcon|6 months ago

I'm exploring kedro and Kedro-viz lately, in case that's in the vicinity of your question. It ties most closely with MLFlow for artifacts, but storing locally works fine too

ghilston|6 months ago

I was a big fan of Metaflow a few years back. I thought it was neat how I could write some code and easily run some functions locally versus remote.

Hey Savin, it's been a while since chatted. I hope things are going well ;)

For those unaware, onsone of the co

marksimi|6 months ago

Curious to hear from folks who have used both Metaflow and Kubeflow to understand some of those tradeoffs.

Seems like Metaflow is comparatively lightweight, bit more tightly integrated with AWS, less end to end and a bit more agile.