top | item 37290078

(no title)

Kadin | 2 years ago

That explains why the K8S defaults are bad, but why use K8S at all here?

Personally it seems like a lot of shade-tree server admins are way too eager to bust out much more tooling than is actually necessary for many tasks, just because it's the way that $BIGNAME company does it.

Realistically, most people running a personal or small community's IT system aren't going to need the sort of crazy scalability that container orchestration systems are designed to provide. And if you're not running them at significant scale, the containerization and orchestration-system overhead are often quite significant fractions of the overall resource footprint. Plus there's the unnecessary complexity due to all the levels of abstraction that just don't need to be there.

E.g. for a Mastodon instance, I wouldn't touch Kubernetes or complex orchestration unless I was out of other, more traditional options for scaling. The server side is already broken into a number of components (RoR app, PosgreSQL, Redis, Sidekiq, node.js) which you can separate out onto their own servers, and from there each component has preferred ways of scaling based on need. And while nothing is safe from failure, backing up a bog-standard PostgreSQL VPS is a lot more straightforward than K8S.

If you're doing deployments dozens of times a day, of course automation is desirable. But if you're doing it once, I suspect most people would be hard-pressed to make back the investment of time and additional testing required (well, should be required) of working through a complex container-management architecture.

discuss

sofixa|2 years ago

You kind of explained it yourself. You need to run a number of components, keep track of all of them, be able to update them, scale them, make sure they're healthy and restarted when needed, etc. This is of course possible in a number of ways, but not trivial. You're basically describing an orchestrator such as Kubernetes or Nomad. Especially with an already existing Helm chart covering all the deployment logic (what needs to be deployed, how many instances, health checks, etc.): https://github.com/mastodon/chart it's quite an easy choice instead.

Kwpolska|2 years ago

Mastodon also has manual install instructions: https://docs.joinmastodon.org/admin/install/

They look fairly simple and typical. Monitoring services could be systemd's job, upgrades would involve apt + bundle + yarn.

havnagiggle|2 years ago

Well I guess I'll share my opinion too. K8s is not just for $BIGNAME companies. Personally I find k8s to be excellent for small shops because it has trivialized the install of large complex apps, opening doors for small groups to have a bigger impact. K8s is simply a package manager for complex deployments.

The learning curve for k8s may be a bit steep initially, but it covers all the areas you _should_ have some understanding of before tackling a large complex install of a web application and puts them in one place: manifests. The best Helm configs provide all the toggles you would need for complex deployments, but provide reasonable defaults (e.g. Mastodon [1]). Literally the application developers are also building the deployment _for_ you. It can't be overstated enough, because of what use is an app if you can't deploy it?

K8s and tools like ArgoCD also encourage some of the best modern practices with IaC. I don't need to hunt for every config file someone might have tinkered with to solve a problem on a server (and hopefully made a doc comment I can find to reproduce it). Every change is there in the manifests. There are OS variations, config variations, etc. and I just don't care about any of that! Just that k8s can run on it and use the resources. The infrastructure and git history is self documenting, allowing me to come into any small team and understand where they are at.

You said orchestration can take a lot of resources: meh. I trust something like k8s to use all of my compute resources more efficiently than a half dozen servers that are probably not sized correctly anyways for the individual components. Digital Ocean will run the control plane for free for example. What the orchestration overhead gives you is trivial reproduction anywhere. It really has nothing to do with "crazy scalability". It's the ability to tear down and bring up the whole cluster in one command. For example you can easily reproduce your full system locally for testing. Good luck doing that reliably with a bunch of servers without making it your full time job and making mistakes constantly.

Re pgsql backups: it's really not any more difficult on k8s. Maybe the problem is too many options: your storage class could do it for you with snapshots, you could have a script connect to the pod and run pg_dump, you could use a third party app like pgadmin4, or you could have sidecars do the work for you (assuming it is configured correctly too). In all cases you need to put the backup data somewhere. You could also use a managed postgres service and completely separate it from your cluster.

Back to the blog post: really the only problem the admin person had was not understanding the default settings of their storage class. Unfortunately that was also the most devastating. They could have also configured ArgoCD to not delete unknown manifests (opening up options to violate IaC). There is an information gap there, there is a gap in testing and backup strategy, and I hope others can learn and avoid getting bitten by it. But this blog post is not the reason to not use k8s. With some, honestly trivial, tweaks, they could still be supporting their community with these apps.

They said they used Vultr's k8s service, and they provide storage solutions also [2]. Did they contact Vultr to see if they implement some recovery safety for them? 30 day deletion policy, etc.? They probably knew about the problem in minutes, and data might have been or is recoverable in the backend.

[1] https://github.com/mastodon/chart/blob/main/values.yaml

[2] https://www.vultr.com/docs/vultr-kubernetes-engine/