Show HN: Kestra - Open-Source Airflow Alternative
142 points| tchiotludo | 4 years ago |github.com
I created a few years ago a successful open source AKHQ project: https://github.com/tchiotludo/akhq (renamed from KafkaHQ) which has been adopted by big companies like Best Buy, Pipedrive, BMW, Decathlon and many more. 2300 stars, 120 contributors, 10M docker downloads, much more than I expected.
Now let's talk about Kestra, an infinitely scalable orchestration and scheduling platform for creating, running, scheduling and monitoring millions of complex pipelines.
I started the project 30 months ago and I'm even more proud of this project that required a lot of investment and time to build the future of data pipelines (I hope). The result is now ready to be presented and I hope to get some feedback from you, HN community.
To have a fully scalable solution, we choose Kafka as our database (of course, I love Kafka if you didn't know) as well as ElasticSearch, Micronaut, ... and can be deployed on Kubernetes, VM or on premise.
You may think there are many alternatives in this area, but we decided to take a different road by using a descriptive approach (low code) to build your pipelines allowing to edit directly from the web interface and deploy to production with terraform directly. We paid a lot of attention to the scalability and performance part which allows us to have already a big production at a big French retailer: Leroy Merlin
Since Kestra core is plugin based, many are available from the core team, but you can create one easily.
More information: - on the official website: https://kestra.io/ - on the medium post: https://medium.com/@kestra-io/introducing-kestra-infinitely-... - check out the project: https://github.com/kestra-io/kestra
Your comments are more than welcome, thank you!
[+] [-] chockchocschoir|4 years ago|reply
[+] [-] tchiotludo|4 years ago|reply
[+] [-] emteycz|4 years ago|reply
Is there a way to use this as a managed service?
Are you looking for independent partners/integrators?
[+] [-] tchiotludo|4 years ago|reply
In a meantime, we provide different installation : - Docker compose: https://kestra.io/docs/administrator-guide/deployment/docker... - Kubernetes: https://kestra.io/docs/administrator-guide/deployment/kubern... - Jar: https://kestra.io/docs/administrator-guide/deployment/manual...
Kestra is not so complicated to be installed, for Kafka and Elasticsearch, you could use Amazon managed service or Aiven for example.
But be sure that we will provide a managed service as soon as possible
[+] [-] dantetheinferno|4 years ago|reply
[+] [-] tchiotludo|4 years ago|reply
For other workflow engine (dagster, prefect, ...), we decided to use a complete different approach on how to build a pipeline. Since others decide to use python code, we decided to go to descriptive language (like terraform for example). This have a lot of advantages on how the developer user experience is: With Kestra, you can directly the web UI in order to edit, create and run your flows, no need to install anything on the user desktop and no need a complex deployment pipeline in order to test on final instance. Other advantage is that it allow to use terraform to deploy your flows, typical development workflow are: on development environment, use the UI, on production deploy your resource with terraform, flow and all the others cloud resource.
After, it will be really nice to have some independent performance benchmark. I really think Kestra is really fast since it was based on a queue system (Kafka) and not a Database. Since workflow are only events (change status, new tasks, ...) that is need to be consume by different service, database don't seems to be a good choice and my benchmark show that Kestra is able to handle a lot of concurrent tasks without using a lot of CPU.
[+] [-] unknown|4 years ago|reply
[deleted]
[+] [-] idomi|4 years ago|reply
[+] [-] rajandatta|4 years ago|reply
Have not seen the participants - how many contributors do you have?
[+] [-] tchiotludo|4 years ago|reply
This one trust on the project and decide to go production with Kestra. So they decide to inject some resource in order to develop some features that need and that is missing.
But basically, not so much people for now. We are trying to start a community around the product and started to communicate around the product since few weeks only, I hope community will follow us! And I hope to succeed like on my other open source project: https://github.com/tchiotludo/akhq
[+] [-] crubier|4 years ago|reply
[+] [-] tchiotludo|4 years ago|reply
Kestra (and so airflow) is more a workflow manager to handle data pipeline like moving large dataset (batch) between different source and destination, do some transformation inside database (ELT) or with Kestra you are also able to transform the data (ETL) before save it to external systems.
This lead Kestra (and so airflow) to have a lot of connectors to differents systems (like SQL, NOSQL, Columns database, Cloud Storage, ...) that is ready to use out of the box.
temporal.io, since it's first design to handle microservice (proprietary & internal service) don't have this connector out of the box, and you will need code all this interaction.
So my opinion:
Building data pipeline interacting with many standard systems will be done easily & quickly with Kestra (or airflow)
Handling internal business process of micro service will done easily with temporal.io
[+] [-] jusonchan81|4 years ago|reply
[+] [-] hackerdad|4 years ago|reply
[+] [-] tlrobinson|4 years ago|reply
[+] [-] tchiotludo|4 years ago|reply
Working well on a standard laptop easily
[+] [-] awild|4 years ago|reply
[+] [-] sackerhews|4 years ago|reply
[+] [-] tchiotludo|4 years ago|reply
[+] [-] sirjaz|4 years ago|reply
[+] [-] tchiotludo|4 years ago|reply
For the desktop app, I don't know, build one with electron can be simple, but a full app is not on the roadmap for now. What is your usages ?
[+] [-] speedgoose|4 years ago|reply
[+] [-] tchiotludo|4 years ago|reply
You have 3 solutions for that:
- you can use this task using runner:DOCKER property and choose the image: https://kestra.io/plugins/core/tasks/scripts/io.kestra.core....
- you can also use PodCreate to launch a pod on a kubernetes cluster: https://kestra.io/plugins/plugin-kubernetes/tasks/io.kestra....
- you have also CustomJob from VertexAI on GCP to be able to launch a container a ephemeral cluster (with any CPU / GPU): https://kestra.io/plugins/plugin-gcp/tasks/vertexai/io.kestr...
[+] [-] wokwokwok|4 years ago|reply
… but your core dependencies are a Kafka cluster and an elastic search cluster which are both a pain in the ass to scale; so really, could you run this seriously without a really expensive hosted cloud instance of both of those?
This kind of wording:
> Since the application is a Kafka Stream, the application can be scale infinitely
Is a major turn off to me.
Kafka cannot scale infinitely. Nothing can. In fact, Kafka can be a pain in the ass to scale.
In makes me question some of the other commentary on the project.
[+] [-] wpietri|4 years ago|reply
> Kafka cannot scale infinitely. Nothing can.
It is very common that when a phrase can't be literally true, it signals a metaphorical meaning. E.g., if a teen tells you their new teacher is a million years old, it's not a literal statement of age. Similarly, nobody expects "scale infinitely" to mean that, as in Universal Paperclips, that we'll be converting whole galaxies into Kestra clusters. It means that any bottlenecks are external to the system.
[+] [-] tchiotludo|4 years ago|reply
I agree with you that Kafka & ElasticSearch can be a pain to scale if you need to have a horizontal and vertical scaling.
On other side, on single machine, it's really has easy to setup. With this, you will have the same scaling than Airflow for exemple since it depend on a non scalable database (mysql or postgres). But the chance you will have with Kestra is that you will be able to scale to multiple node for your backend (as well with kestra that allow scaling all services). When you hit the limit with standard database, you will be stuck.
And yes clearly infinite scale is not a literal statement terms, nothing can scale infinitely but since the architecture is really robust (and scalable), the issues will be on other aspects than Kestra (cloud limit, database overload, ...).
A final point and a more important one, the backend are all pluggagle in Kestra since Kestra is really think as module: Look at the directory here : https://github.com/kestra-io/kestra :
- runner-kafka & runner-memory are 2 implementation of Kestra, you can add a new one that will use Redis, Pulsar, ...
- repository-elasticsearch & repository-memory is the same, you can implement another one, I started one implementation for JDBC that I don't have the time to finish for now : https://github.com/kestra-io/kestra/pull/368
[+] [-] jturpin|4 years ago|reply
[+] [-] mountainriver|4 years ago|reply
[+] [-] minroot|4 years ago|reply
[+] [-] tchiotludo|4 years ago|reply
- Because the application is built on top of Kafka, and Kafka Streams that is only available on Java
- Because the java ecosystem is very large and there is a lot good library to handle a lot of workload
- Because I love strong typing and the language (but no matter for the user, just a personal pleasure :D)
[+] [-] Hypocritelefty|4 years ago|reply
[deleted]
[+] [-] unknown|4 years ago|reply
[deleted]
[+] [-] mekster|4 years ago|reply
I see partial structures and then JSON string as is and then some long blob of string no one can understand what it is with no new lines.
What devs want are pretty simple, structured log with table layout without repeating the column names on every row to make it look insanely verbose for any human to consume.
I'm picking up bits of open source apps to build a decent solution with Vector (which has awesome Vector remap language to parse strings into structured data if it isn't already) and throw it into ClickHouse to view it from Metabase.
Apparently, Kibana, Graylog or even Grafana are pretty bad at displaying logs to even feel tiny comfortable reading it every day.
Logging is such a crucial part of developer life and not sure why that there aren't any sane open source solutions.
[+] [-] tchiotludo|4 years ago|reply
It's not as json ? or I don't understand where you see that.
[+] [-] Hypocritelefty|4 years ago|reply
[deleted]