(no title)
kg | 8 days ago
This seems REALLY bad for reliability? I guess the idea is that it's better to have things not respond to requests than to lose data, but the outcome described in the article is pretty nasty.
It seems like the solution they arrived at was to "fix" this at the filesystem level by making fsync no longer deliver reliability, which seems like a pretty clumsy solution. I'm surprised they didn't find some way to make etcd more tolerant of slow storage. I'd be wary of turning off filesystem level reliability at the risk of later running postgres or something on the same system and experiencing data loss when what I wanted was just for kubernetes or whatever to stop falling over.
PunchyHamster|8 days ago
It is. Because if it really starts to crap out above 100ms just a small hiccup in network attached storage of VM it is running on cam
but it's not as simple as that, if you have multiple nodes, and one starts to lag, kicking it out is only way to keep the latency manageable.
Better solution would be to keep cluster-wide disk latency average and only kick node that is slow and much slower than other nodes; that would also auto tune to slow setups like someone running it on some spare hdds at homelab
denysvitali|8 days ago
weiliddat|8 days ago
And also in prod, etcd recommends you run with SSDs to minimize variance of fsync/write latencies
justincormack|8 days ago
justsomehnguy|8 days ago
_ananos_|8 days ago
api|8 days ago
landl0rd|8 days ago
If you run this on single-tenant boxes that are set up carefully (ideally not multi-tenant vCPUs, low network RTT, fast CPU, low swappiness, `nice` it to high I/O priority, `performance` over `ondemand` governor, XFS) it scales really nicely and you shouldn't run into this.
So there are cases where you actually do want this. A lot of k8s setups would be better served by just hitting postgres, sure, and don't need the big fancy toy with lots of sharp edges. It's got raison d'etre though. Also you just boot slow nodes and run a lot.
fcantournet|8 days ago
[deleted]