top | item 21982950

(no title)

kumare3 | 6 years ago

Great question, I am working on a follow up blog that will explain the differences in more detail. Flyte does take some inspiration from airflow, but it has a lot of important differences - Flyte natively understands data flow between tasks. This is achieved using its own type system created in protobuf - Flyte tasks are first class citizens and hence can be shared, reused and are always associated with an interface declaration - Flyte is container and kibernetes native. It is also multi tenant. - Flyte corn scheduler, control plane api and the actual execution engine are decoupled. Each workflow can be independently executed on a different execution engine - Flyte workflows are purely specification - defined in protobuf and Flyte tasks also - Flyte provides an event stream of the execution - since Flyte is aware of the data, it comes with built in memorization and auto cataloging - like airflow Flyte can have plugins in python, but it supports a richer plugin interface - Flyte is written in Golang and on top of kuberenetes It is definitely less mature in the open source, so please help us make it better. But it has been battle tested at Lyft for more than 3 years in production.

discuss

order

mmq|6 years ago

Quite interesting to hear this, it's very much the same observations while working with some customers, Airflow is a very mature and an amazing tool, but it does not have a good state/artifacts management, which leaves the users tweaking around, scheduling is centralised, and is not designed for ML workflows, i.e. hyperparams tuning, distributed runs, ... the kubernetes support is also quite limited.

Polyaxon[0] took a similar approach to FLyte, i.e. for authoring specifications: strongly typed system in protobuf + intuitive yaml specification + sdks in Python/golang/java/... It also treats operations (tasks in Flyte) as first class citizens and allows to run them in a serverless way. Users can choose to register repetitive operations as components and share them with a description and a typed inputs/outputs.

[0] https://github.com/polyaxon/polyaxon

zerovar|6 years ago

Have Lyft migrated all their workflows from Airflow to Flyte? Or does Airflow still play a role alongside Flyte? Was assuming Lyft is running workflows in Airflow from this post https://eng.lyft.com/running-apache-airflow-at-lyft-6e53bb8f...

kumare3|6 years ago

Another great question. So Airflow is used at Lyft for ETL. I think for traditional ETL it still is a good fit. But, there is an effort to not just migrate, but rethink how we can leverage Flyte's capabilities to improve our ETL experience.

But, as it exists, we have a FlyteAirflowOperator, so that users can easily connect their Airflow pipelines with Flyte and write the new ones on Flyte alone.

Stay tuned for developments on this front :)