top | item 42290688

(no title)

tutfbhuf | 1 year ago

I have experience running Kubernetes clusters on Hetzner dedicated servers, as well as working with a range of fully or highly managed services like Aurora, S3, and ECS Fargate.

From my experience, the cloud bill on Hetzner can sometimes be as low as 20% of an equivalent AWS bill. However, this cost advantage comes with significant trade-offs.

On Kubernetes with Hetzner, we managed a Ceph cluster using NVMe storage, MariaDB operators, Cilium for networking, and ArgoCD for deploying Helm charts. We had to handle Kubernetes cluster updates ourselves, which included facing a complete cluster failure at one point. We also encountered various bugs in both Kubernetes and Ceph, many of which were documented in GitHub issues and Ceph trackers. The list of tasks to manage and monitor was endless. Depending on the number of workloads and the overall complexity of the environment, maintaining such a setup can quickly become a full-time job for a DevOps team.

In contrast, using AWS or other major cloud providers allows for a more hands-off setup. With managed services, maintenance often requires significantly less effort, reducing the operational burden on your team.

In essence, with AWS, your DevOps workload is reduced by a significant factor, while on Hetzner, your cloud bill is significantly lower.

Determining which option is more cost-effective requires a thorough TCO (Total Cost of Ownership) analysis. While Hetzner may seem cheaper upfront, the additional hours required for DevOps work can offset those savings.

discuss

supriyo-biswas|1 year ago

This is definitely some ChatGPT output being posted here and your post history also has a lot of this "While X, Y also does Z. Y already overlaps with X" output.

I'd like to see your breakdowns as well, given that the cost difference between a 2 vCPU, 4GB configuration (as an example) and a similar configuration on AWS is priced much higher.

There's also https://github.com/kube-hetzner/terraform-hcloud-kube-hetzne... to reduce the operational burden that you speak of.

tutfbhuf|1 year ago

It is my ouput, but I use ChatGPT to fix my spelling and grammar. Maybe my prompt for that should be refined in order to not alter the wording too much.

0xFF0123|1 year ago

While I agree that your characterisation is true for a lot of chatgpt output, it can also be true for a human explaining their nuanced point of view.

unknown|1 year ago

[deleted]

MathMonkeyMan|1 year ago

I've never operated a kubernetes cluster except for a toy dev cluster for reproducing support issues.

One day it broke because of something to do with certificates (not that it was easy to determine the underlying problem). There was plenty of information online about which incantations were necessary to get it working again, but instead I nuked it from orbit and rebuilt the cluster. From then on I did this every few weeks.

A real kubernetes operator would have tooling in place to automatically upgrade certs and who knows what else. I imagine a company would have to pay such an operator.

_bare_metal|1 year ago

This.

I run BareMetalSavings.com[0], a toy for ballpark-estimating bare-metal/cloud savings, and the companies that have it hardest to move away from the cloud are those who are highly dependent on Kubernetes.

It's great for the devs but I wouldn't want to operate a cluster.

[0]: https://www.BareMetalSavings.com

declan_roberts|1 year ago

That's just not how it works on any scale other than "toy"

KaiserPro|1 year ago

Ceph is a bastard to run. Its expensive, slow and just not really ready. Yes I know people use it, but compared to a fully grown up system (ie lustre[don't its raid 0 in prod] or GPFS [great but expensive]) its just a massive time sync.

You are much better off having a bunch of smaller file systems exported over NFS make sure that you have block level replication. Single address space filesystems are ok and convenient, but most of the time are not worth the cost of admin to get reliable at scale. like a DB shard your filesystems, especially as you can easily add mapping logic to kubernetes to make sure you get the right storage to the right image.

mfld|1 year ago

I saw that Hetzner is beta testing ceph-based object storage. This could make the setup much easier. Anyone tested this already?

sgarland|1 year ago

I agree that it is hideously complicated (to anyone saying “just use Rook,” I’ll counter that if you haven’t read through Ceph’s docs in full, you’re deluding yourself that you know how to run it), but given that CERN uses it at massive scale, I think it’s definitely prod-ready.

freedomben|1 year ago

I mostly agree, but it surprises me that people don't often consider a solution right in the center, such as openshift. Have a much, much less burden for devops and have all the power and flexibility of running on bare metal. It's a great hybrid between a fully managed and expensive service versus a complete build your own. It's expensive enough. Todd, for startups it is not likely a good option, but if you have a cluster with at least 72 GB of RAM or 36 CPUs going (about 9 mid size nodes), you should definitely consider something like openshift.

mountainriver|1 year ago

Manually updating k8s clusters is a huge tradeoff. I can’t imagine doing that to save a couple bucks unless I was desperate

TheDong|1 year ago

I dunno, I've had to spend like two or three hours each month on updating mine for its entire lifetime (of over 5 years now), and that includes losing entire nodes to hardware failure and spinning up new ones.

Originally it was ansible, and so spinning up a new node or updating all nodes was editing one file (k8s version and ssh node list), and then running one ansible command.

Now I'm using nixos, so updating is just bumping the version number, a hash, and typing "colmena apply".

Even migrating the k8s cluster from ansible to nixos was quite easy, I just swapped one node at a time and it all worked.

People are so afraid of just like learning basic linux sysadmin operations, and yet it also makes it way easier to understand and debug the system too, so it pays off.

I had to help someone else with their EKS cluster, and in the end debugging the weird EKS AMI was a nightmare and required spending more time than all the time I've had to spend on my own cluster over the last year combined.

From my perspective, using EKS both costs more money, gives you a worse K8s (you can't use beta features, their ami sucks), and also pushes you to have a worse understanding of the system so that you can't understand bugs as easily and when it breaks it's worse.

dijit|1 year ago

if the "couple of bucks" ends up being the cost of an entire team, then hire a small team to do it.

Then get mad at them because they don't "produce value", and fold it into a developers job with an even higher level of abstraction again. This is what we always do.

lucasrattz|1 year ago

We at https://syself.com have made a platform with "one-click updates". 100% vanilla Kubernetes on Hetzner.

p_l|1 year ago

The "couple bucks" in my experience were difference between viable business and bankrupt one - including time spent on maintaining k8s!

spwa4|1 year ago

> Determining which option is more cost-effective requires a thorough TCO (Total Cost of Ownership) analysis. While Hetzner may seem cheaper upfront, the additional hours required for DevOps work can offset those savings.

Sure, but the TLDR is going to be that if you employ n or more sysadmins, the cost savings will dominate. With 2 < n < 7. So for a given company size, Hetzner will start being cheaper at some point, and it will become more extreme the bigger you go.

Second if you have a "big" cost, whatever it is, bandwidth, disk space (essentially anything but compute), cost savings will dominate faster.

stackskipton|1 year ago

Not always. Employing Sysadmins doesn't mean Hetzner is cheaper because those "Sysadmin/Ops type people" are being hired to managed the Kubernetes cluster. And Ops type people who truly know Kubernetes are not cheap.

Sure, you can get away with legoing some K3S stuff together for a while but one major outage later, and that cost saving might have entirely disappeared.

UltraSane|1 year ago

Even a short outage can completely wipe out any savings.

kshri24|1 year ago

Is it just me or do the last 3 paragraphs feel like ChatGPT output?

tutfbhuf|1 year ago

I used GPT4o to fix all my spelling and grammar mistakes, maybe it went a little too far, but this is 100% my comment

runeks|1 year ago

Isn't the point of ChatGPT to mimic sentences written by humans?

andai|1 year ago

Yeah, I was wondering the same thing.

LordMignion|1 year ago

[deleted]