top | item 30790047

Show HN: Kestra - Open-Source Airflow Alternative

142 points| tchiotludo | 4 years ago |github.com

Hey HN, I'm really proud to share with you my new open source project: Kestra https://github.com/kestra-io/kestra

I created a few years ago a successful open source AKHQ project: https://github.com/tchiotludo/akhq (renamed from KafkaHQ) which has been adopted by big companies like Best Buy, Pipedrive, BMW, Decathlon and many more. 2300 stars, 120 contributors, 10M docker downloads, much more than I expected.

Now let's talk about Kestra, an infinitely scalable orchestration and scheduling platform for creating, running, scheduling and monitoring millions of complex pipelines.

I started the project 30 months ago and I'm even more proud of this project that required a lot of investment and time to build the future of data pipelines (I hope). The result is now ready to be presented and I hope to get some feedback from you, HN community.

To have a fully scalable solution, we choose Kafka as our database (of course, I love Kafka if you didn't know) as well as ElasticSearch, Micronaut, ... and can be deployed on Kubernetes, VM or on premise.

You may think there are many alternatives in this area, but we decided to take a different road by using a descriptive approach (low code) to build your pipelines allowing to edit directly from the web interface and deploy to production with terraform directly. We paid a lot of attention to the scalability and performance part which allows us to have already a big production at a big French retailer: Leroy Merlin

Since Kestra core is plugin based, many are available from the core team, but you can create one easily.

More information: - on the official website: https://kestra.io/ - on the medium post: https://medium.com/@kestra-io/introducing-kestra-infinitely-... - check out the project: https://github.com/kestra-io/kestra

Your comments are more than welcome, thank you!

69 comments

order
[+] chockchocschoir|4 years ago|reply
I just noticed the title says "Open-Source Airflow Alternative" but Airflow is already Open-Source, so shouldn't you describe it as just "Airflow Alternative"? Otherwise you make it sound like Airflow isn't Open-Source but this is.
[+] tchiotludo|4 years ago|reply
The title was changed by moderators and I can't edit it anymore :'(
[+] emteycz|4 years ago|reply
This looks incredible!

Is there a way to use this as a managed service?

Are you looking for independent partners/integrators?

[+] tchiotludo|4 years ago|reply
For now, we don't provide a SAAS for Kestra, it's definitely on the roadmap and our next project.

In a meantime, we provide different installation : - Docker compose: https://kestra.io/docs/administrator-guide/deployment/docker... - Kubernetes: https://kestra.io/docs/administrator-guide/deployment/kubern... - Jar: https://kestra.io/docs/administrator-guide/deployment/manual...

Kestra is not so complicated to be installed, for Kafka and Elasticsearch, you could use Amazon managed service or Aiven for example.

But be sure that we will provide a managed service as soon as possible

[+] dantetheinferno|4 years ago|reply
Why is this better than Airflow, or Prefect, or Dagster?
[+] tchiotludo|4 years ago|reply
Airflow have design issue and performance issue, If you want to have some details, you can find some reason on this article: https://kestra.io/blogs/2022-02-22-leroy-merlin-usage-kestra....

For other workflow engine (dagster, prefect, ...), we decided to use a complete different approach on how to build a pipeline. Since others decide to use python code, we decided to go to descriptive language (like terraform for example). This have a lot of advantages on how the developer user experience is: With Kestra, you can directly the web UI in order to edit, create and run your flows, no need to install anything on the user desktop and no need a complex deployment pipeline in order to test on final instance. Other advantage is that it allow to use terraform to deploy your flows, typical development workflow are: on development environment, use the UI, on production deploy your resource with terraform, flow and all the others cloud resource.

After, it will be really nice to have some independent performance benchmark. I really think Kestra is really fast since it was based on a queue system (Kafka) and not a Database. Since workflow are only events (change status, new tasks, ...) that is need to be consume by different service, database don't seems to be a good choice and my benchmark show that Kestra is able to handle a lot of concurrent tasks without using a lot of CPU.

[+] idomi|4 years ago|reply
Or Ploomber?
[+] rajandatta|4 years ago|reply
We all may have questions for you on some of your descriptions and choices. None of that should take away from the fact that this is a pretty impressive stage for a 30-month open source project.

Have not seen the participants - how many contributors do you have?

[+] tchiotludo|4 years ago|reply
The project start as a side project (yet another side project I do the night and weekend) but was quickly promoted and used in a French Big Retail Company.

This one trust on the project and decide to go production with Kestra. So they decide to inject some resource in order to develop some features that need and that is missing.

But basically, not so much people for now. We are trying to start a community around the product and started to communicate around the product since few weeks only, I hope community will follow us! And I hope to succeed like on my other open source project: https://github.com/tchiotludo/akhq

[+] crubier|4 years ago|reply
Looks cool ! How does it compare to temporal.io in your experience ? I’m evaluating options at my current company, between airflow and temporal.
[+] tchiotludo|4 years ago|reply
Temporal.io is a really cool framework for building business process like managing microservice workflow (like paiement workflow: user pay, we call the shipping microservice, the billing microservice, ...) and good fit to handle individual event (lots of individual events).

Kestra (and so airflow) is more a workflow manager to handle data pipeline like moving large dataset (batch) between different source and destination, do some transformation inside database (ELT) or with Kestra you are also able to transform the data (ETL) before save it to external systems.

This lead Kestra (and so airflow) to have a lot of connectors to differents systems (like SQL, NOSQL, Columns database, Cloud Storage, ...) that is ready to use out of the box.

temporal.io, since it's first design to handle microservice (proprietary & internal service) don't have this connector out of the box, and you will need code all this interaction.

So my opinion:

Building data pipeline interacting with many standard systems will be done easily & quickly with Kestra (or airflow)

Handling internal business process of micro service will done easily with temporal.io

[+] jusonchan81|4 years ago|reply
Netflix Conductor is great alternative to Temporal. There is a fully managed offering for this as well. The biggest advantage is that it’s quite simple to understand and has great visualization of flows.
[+] hackerdad|4 years ago|reply
Have you tried Netflix Conductor (https://github.com/Netflix/conductor) - if you are evaluating between Airflow - this could be a great alternative - scales well and gives you option to write your workflows in code as well as config.
[+] tlrobinson|4 years ago|reply
These kinds of tools seem to be meant to scale up well, but are there good ones that “scale down” to small projects too?
[+] awild|4 years ago|reply
Airflow is relatively easy to set up once you have the hang of it. At its most basic it needs three containers (server, sql, executor), and your dag definitions which are very straightforward python code.
[+] sackerhews|4 years ago|reply
Cool. But please fix that light gray text on white background in the demo (or make it even paler for more avant-garde :)
[+] tchiotludo|4 years ago|reply
Do you have a screenshot please ? I didn't notice where. Thanks
[+] sirjaz|4 years ago|reply
Are there any plans for a desktop app for Kestra or the ability to support Windows Server outside of docker?
[+] tchiotludo|4 years ago|reply
The support of windows server seems to be easy I think. Since it's java behind, most of the api is working on windows. Just need to create a custom task for windows, added in the backlog : https://github.com/kestra-io/kestra/issues/519

For the desktop app, I don't know, build one with electron can be simple, but a full app is not on the roadmap for now. What is your usages ?

[+] speedgoose|4 years ago|reply
Can you run software containers as steps ?
[+] tchiotludo|4 years ago|reply
Yes, of course!

You have 3 solutions for that:

- you can use this task using runner:DOCKER property and choose the image: https://kestra.io/plugins/core/tasks/scripts/io.kestra.core....

- you can also use PodCreate to launch a pod on a kubernetes cluster: https://kestra.io/plugins/plugin-kubernetes/tasks/io.kestra....

- you have also CustomJob from VertexAI on GCP to be able to launch a container a ephemeral cluster (with any CPU / GPU): https://kestra.io/plugins/plugin-gcp/tasks/vertexai/io.kestr...

[+] wokwokwok|4 years ago|reply
You’re basically pitching this as a more complicated version of airflow that does basically the same thing, but slightly differently, and scales better?

… but your core dependencies are a Kafka cluster and an elastic search cluster which are both a pain in the ass to scale; so really, could you run this seriously without a really expensive hosted cloud instance of both of those?

This kind of wording:

> Since the application is a Kafka Stream, the application can be scale infinitely

Is a major turn off to me.

Kafka cannot scale infinitely. Nothing can. In fact, Kafka can be a pain in the ass to scale.

In makes me question some of the other commentary on the project.

[+] wpietri|4 years ago|reply
As long as we're airing pet peeves, mine is about over-literal misunderstandings:

> Kafka cannot scale infinitely. Nothing can.

It is very common that when a phrase can't be literally true, it signals a metaphorical meaning. E.g., if a teen tells you their new teacher is a million years old, it's not a literal statement of age. Similarly, nobody expects "scale infinitely" to mean that, as in Universal Paperclips, that we'll be converting whole galaxies into Kestra clusters. It means that any bottlenecks are external to the system.

[+] tchiotludo|4 years ago|reply
I don't pitch as more complicated version of Airflow, rather, I think it's more simple than Airflow on the UX side: we use declarative flow with yaml and not python code that can be

I agree with you that Kafka & ElasticSearch can be a pain to scale if you need to have a horizontal and vertical scaling.

On other side, on single machine, it's really has easy to setup. With this, you will have the same scaling than Airflow for exemple since it depend on a non scalable database (mysql or postgres). But the chance you will have with Kestra is that you will be able to scale to multiple node for your backend (as well with kestra that allow scaling all services). When you hit the limit with standard database, you will be stuck.

And yes clearly infinite scale is not a literal statement terms, nothing can scale infinitely but since the architecture is really robust (and scalable), the issues will be on other aspects than Kestra (cloud limit, database overload, ...).

A final point and a more important one, the backend are all pluggagle in Kestra since Kestra is really think as module: Look at the directory here : https://github.com/kestra-io/kestra :

- runner-kafka & runner-memory are 2 implementation of Kestra, you can add a new one that will use Redis, Pulsar, ...

- repository-elasticsearch & repository-memory is the same, you can implement another one, I started one implementation for JDBC that I don't have the time to finish for now : https://github.com/kestra-io/kestra/pull/368

[+] jturpin|4 years ago|reply
Elasticsearch is not a pain in the ass to scale, it is one of the easiest databases to scale. Kafka is medium, since they ditched Zookeeper.
[+] mountainriver|4 years ago|reply
Yeah and ES doesn’t scale forever, also running both of those is incredibly computationally expensive. You would really need the right use care
[+] minroot|4 years ago|reply
Why Java?
[+] tchiotludo|4 years ago|reply
- For performance mostly, Kestra rely a lot on Java thread to be able to handle a very large workload

- Because the application is built on top of Kafka, and Kafka Streams that is only available on Java

- Because the java ecosystem is very large and there is a lot good library to handle a lot of workload

- Because I love strong typing and the language (but no matter for the user, just a personal pleasure :D)

[+] mekster|4 years ago|reply
Why is everyone ok with logs being dumped like it's in a trash can?

I see partial structures and then JSON string as is and then some long blob of string no one can understand what it is with no new lines.

What devs want are pretty simple, structured log with table layout without repeating the column names on every row to make it look insanely verbose for any human to consume.

I'm picking up bits of open source apps to build a decent solution with Vector (which has awesome Vector remap language to parse strings into structured data if it isn't already) and throw it into ClickHouse to view it from Metabase.

Apparently, Kibana, Graylog or even Grafana are pretty bad at displaying logs to even feel tiny comfortable reading it every day.

Logging is such a crucial part of developer life and not sure why that there aren't any sane open source solutions.