(no title)
pwmtr | 1 year ago
Yeah, this is generally a good practice. The silver lining is that our suffering helped uncover the underlying issue faster. :)
This isn’t part of the blog post, but we also considered getting the servers and keeping them idle, without actual customer workload, for about a month in the future. This would be more expensive, but it could help identify potential issues without impacting our users. In our case, the crashes started three weeks after we deployed our first AX162 server, so we need at least a month (or maybe even longer) as a buffer period.
ThePowerOfFuet|1 year ago
Did you actually uncover the true root cause? Or did they finally uncap the power consumption without telling you, just as they neither confirmed nor denied having limited it?
pwmtr|1 year ago
I don't believe they simply lifted a power cap (if there was one in the first place). I genuinely think the fix came after the motherboard replacements. We had 2 batches of motherboard replacements and after that, the issue disappeared.
If someone from Hetzner is here, maybe they can give extra information.
oz3d|1 year ago
[1] https://status.hetzner.com/incident/7fae9cca-b38c-4154-8a27-...
axus|1 year ago
rat9988|1 year ago
crishoj|1 year ago
babuskov|1 year ago