top | item 23884430

(no title)

zzzcpan | 5 years ago

You are missing the point. Solving the truck problem is exactly what you shouldn't do, well, at least until your system is resilient. Because it could be something entirely different, it could be law enforcement raiding a data center and your wall around it won't protect it from them. So instead you approach the system in terms of what it has to rely on and all possible states of the thing it has to rely on. Which maps to a very small number of decisions. Like whether a server is available or not. If it's not available it really doesn't matter which of the infinite things that could happen to it or to a data center it is in actually did, you simply don't return it to users if it's not available and have enough independent servers to return to users in enough independent data centers to achieve specific availability. It's really not difficult.

I understand that most of those leetcode corporations don't care much about resilience, likely even incapable of producing highly reliable systems, and may give you a false impression that reliability is something of an unachievable fantasy. But it's not, it's something we have enough research done on and can do really well today if needed, we are not in titanic era anymore.

I have high confidence in these things (not in "predicting the unforeseeable"), because I've done them myself. My edge infrastructure had like half an hour of downtime total in many years, almost a decade already.

discuss

order

No comments yet.