(no title)
tgdn | 4 years ago
Define minimum/maximum number of nodes, the machine capacity (RAM/CPU) and let Spark handle the scaling for you.
It gives you a Jupyter-like runtime to work on possibly massive datasets. Spark is perhaps too much for what you're looking for. Kubernetes could possibly be used with Airflow/DBT possibly, for example for ETL/ELT pipelines.
ekns|4 years ago