(no title)
motakuk | 3 years ago
Helm (https://github.com/grafana/oncall/tree/dev/helm/oncall), docker-composes for hobby and dev environments.
Besides deployment, there are two main priorities for OnCall architecture: 1) It should be as "default" as possible. No fancy tech, no hacking around 2) It should deliver notifications no matter what.
We chose the most "boring" (no offense Django community, that's a great quality for a framework) stack we know well: Django, Rabbit, Celery, MySQL, Redis. It's mature, reliable, and allows us to build a message bus-based pipeline with reliable and predictable migrations.
It's important for such a tool to be based on message bus because it should have no single point of failure. If worker will die, the other will pick up the task and deliver alert. If Slack will go down, you won't loose your data. It will continue delivering to other destinations and will deliver to Slack once it's up.
The architecture you see in the repo was live for 3+ years now. We were able to perform a few hundreds of data migrations without downtimes, had no major downtimes or data loss. So I'm pretty happy with this choice.
gen220|3 years ago
To be fair, even in its current form, it should be possible to operate this system with sqlite (i.e. no db server) and in-process celery workers (i.e. no rabbit MQ) if configured correctly, assuming they're not using MySQL-specific features in the app.
Using a message bus, a persistent data store behind a SQL interface, and a caching layer are all good design choices. I think the OP's concern is less with your particular implementations, and more with the principle of preventing operators from bringing their own preferred implementation of those interfaces to the table.
They mentioned that it makes sense because you were a standalone product, so stack portability was less of a concern. But as FOSS, you're opening yourself up to different standards on portability.
It requires some work on the maintainer to make the application tolerant to different fulfillments of the same interfaces. But it's good work. It usually results in cleaner separation of concerns between application logic and caching/message bus/persistence logic, for one. It also allows your app to serve a wider audience: for example, those who are locked-in to using Postgres/Kafka/Memcached.
raffraffraff|3 years ago
I get why you would add a relational DB to the mix. Personally, I'd like a Rabbit-free option.
Deritio|3 years ago
Sorry but why is rabbitmq really necessary?
slotrans|3 years ago
throwaway892238|3 years ago
One of them allows a thousand different nodes on different networks to share a single dataset with high availability. The other can't share data with any other application, doesn't have high availability, is constrained by the resources of the executing application node, has obvious performance limits, limited functionality, no commercial support, etc etc.
And we're talking about a product that's intended for dealing with on-call alerts. The entire point is to alert when things are crashing, so you would want it to be highly available. As in, running on more than one node.
I know the HN hipsters are all gung-ho for SQLite, but let's try to reign in the hype train.
sergiomattei|3 years ago
They chose this stack, it works for them. They’ve put it through its paces in production.
It’s as boring as it gets.