(no title)
yanslookup | 4 months ago
I just don't see it. Given the nature of the services they offer it's just too risky not to use as much managed stuff with SLAs as possible. k8s alone is a very complicated control plane + a freaking database that is hard to keep happy if it's not completely static. In a prior life I went very deep on k8s, including self managing clusters and it's just too fragile, I literally had to contribute patches to etcd and I'm not a db engineer. I kept reading the post and seeing future failure point after future failure point.
The other aspect is there doesn't seem to be an honest assessment of the tradeoffs. It's all peaches and cream, no downsides, no tradeoffs, no risk assessment etc.
hedora|4 months ago
Microk8s doesn’t use etcd (they have their own, simpler thing), which seems like a good tradeoff at single rack scale: https://benbrougher.tech/posts/microk8s-6-months-later/
The article’s deployment has a spare rack in a second DC and they do a monthly cutover to AWS in case the colo provider has a two site issue.
Spending time on that would make me sleep much better than hardening a deployment of etcd running inside a single point of failure.
What other problems do you see with the article? (Their monthly time estimates seem too low to me - they’re all 10x better than I’ve seen for well-run public cloud infrastructure that is comparable to their setup).
AndroTux|4 months ago
And let’s be very real here: if your cloud service goes down for a few hours because you screwed something up, or because AWS deployed some bad DNS rules again, the world moves on. At the end of the day, nobody gives a shit.
yanslookup|4 months ago
AWS truly does let you focus on your business logic and abstracts a TON of undifferentiated work and well beyond the low hanging fruit of system updates and load balancing.
I guess put another way, providing a SaaS you need to have an SLA, those SLAs flow from SLO and SLIs and ultimately a risk profile of your hw and sw. The risk of a bad HBA alone probably means a day of downtime if you don't do things perfectly. AWS has bad HBAs, CPUs, memory, disks etc all day long every day and it's not even a blip for customers, never mind downtime. And if you don't model bad HBAs in your SLAs then your board is going to be pissed when that outage inevitably happens.
Now if you don't have SLAs and you like sysops, networkops, clusterops, dbops work then sure, YOLO.
fulafel|4 months ago
But, SLAs are no good (who cares about getting refunded).
yearolinuxdsktp|4 months ago
dumbledoren|4 months ago