(no title)
zenlikethat | 10 months ago
1. Great Python support. Piping something from a structured data catalog into Python is trivial, and so is persisting results. With materialization, you never need to recompute something in Python twice if you don’t want to — you can store it in your data catalog forever.
Also, you can request anything Python package you want, and even have different Python versions and packages in different workflow steps.
2. Catalog integration. Safely make changes and run experiments in branches.
3. Efficient caching and data re-use. We do a ton of tricks behind to scenes to avoid recomputing or rescanning things that have already been done, and pass data between steps with Arrow zero copy tables. This means your DAGs run a lot faster because the amount of time spent shuffling bytes around is minimal.
No comments yet.