top | item 29612249

(no title)

davinchia | 4 years ago

Airflow is a general orchestration tool that fit into the Python stack extremely well. It wasn't build to scale though, so once you want to run something more than once a second, you are going to be jumping through hoops.

My experience with Dataflow is 1.5 years old, so things might have changed, but I felt it more to be a unified, simplified Hadoop/Spark framework. It unifies the batch/streaming concepts but is still pretty low-level.

Within ELT, or ETL, Airflow/Dataflow can fulfills all 3 components.

Airbyte focuses just on EL (though we have basic T functionality around normalisation). Our intention is to leave T to the warehouse, since warehouses like Redshift/Snowflake/BigQuery are extremely powerful these days, and tools like DBT, give the users more more flexibility to recombine and consume the raw data than a specialised ELT pipeline.

In summary, I would say Airbyte is a specialised subset of Airflow/Dataflow, and it's possible to use Airbyte with either tools, though I'd guide someone towards DBT.

discuss

order

No comments yet.