(no title)
factormeta | 1 year ago
If you not a HN person with systemadmin skills yes. But is NOT that hard to have in house RADI hd setup, with failover server. Or failover NAT gateway. AWS and cloud provider are just a rip off.
factormeta | 1 year ago
If you not a HN person with systemadmin skills yes. But is NOT that hard to have in house RADI hd setup, with failover server. Or failover NAT gateway. AWS and cloud provider are just a rip off.
lelag|1 year ago
Lichess admins are highly skilled and I'm sure they already have a well designed infrastructure. You can see what they use at https://docs.google.com/spreadsheets/d/1Si3PMUJGR9KrpE5lngSk...
The issue was on a network equipment that they didn't even manage. You can't load balance when your core network is down. There was nothing they could do as I understand it.
More details at: https://lichess.org/@/Lichess/blog/post-mortem-of-our-longes...
lossolo|1 year ago
I have been running fault-tolerant systems spread across multiple dedicated servers (inside system with multiple DB/KV stores distributed/replicated/sharded, Kafka etc). If one server experiences hardware failure, the system will automatically recover within seconds to minutes (depending on which server/part of service failed) without any data loss.
It's not that hard. You need the knowledge, but it's not rocket science.
olieidel|1 year ago
OPs comment is valid - physical servers might incur downtime.
But I do agree with your sentiment. "Downtime" is not an argument which should tilt the discussion towards either physical servers or the cloud. AWS data centers famously also have outages, while physical servers often have uptimes of multiple years. So what's better? It's hard to tell, but at the very least, none of these solutions is downtime-free.
rcarmo|1 year ago