Using Wall Street secrets to reduce the cost of cloud infrastructure

[+] sitkack|6 years ago|reply

Skimmed article, didn't read paper if there was one. The article talks about data center links. These are rarely the failure mode, the failure mode is most often a bad config push and that then brings about a gray failure. And while a route may exist, because of priority levels and the way routes are announced, a part of the network is down even though there exists a physical path. This is a solution to a highly constrained model, not actual cloud computing.

[+] cfors|6 years ago|reply

For sure putting another complex system in front of any sort of traffic routing increases the failure modes, but the article makes a nod towards "signal quality" as the metric for traffic shifting.

> Failure probabilities were obtained by checking the signal quality of every link every 15 minutes. If the signal quality ever dipped below a receiving threshold, they considered that a link failure.

If the signal quality could be a higher level construct (Layer 7 errors), this could route around bad config pushes if they are constrained. I'm not going to pretend that this is definitely feasible, but at least that was my first thought.

[+] jamez1|6 years ago|reply

Paper available here:

https://people.csail.mit.edu/ghobadi/papers/teavar_sigcomm_2...

[+] jamez1|6 years ago|reply

It just appears to apply VaR to failure rates, which is hardly a 'wall street secret'.

The authors also completely misunderstand how VaR works - it is the minimum value at risk, not the maximum.

Value at Risk (VaR) [33] captures precisely these bounds. Given a probability threshold β (say β = 0.99), VaRβ provides a probabilistic upper bound on the loss: the loss is less than VaRβ with probability β

It is actually a probabilistic lower bound on the loss.

[+] unknown|6 years ago|reply

[deleted]

[+] Buge|6 years ago|reply

>for a target percentage of time—say, 99.9 percent—the network can handle all data traffic, so there is no need to keep any links idle. During that 0.01 percent of time, the model also keeps the data dropped as low as possible.

99.9 + 0.01 != 100

[+] thinkloop|6 years ago|reply

"Wall Street secrets" to idle nodes less unnecessarily? Article writing truly is an art.

11 comments