top | item 45644976

(no title)

movpasd | 4 months ago

I used Claude to get the outage start and ends from the post-event summaries for major historical AWS outages: https://aws.amazon.com/premiumsupport/technology/pes/

The cumulative distribution actually ends up pretty exponential which (I think) means that if you estimate the amount of time left in the outage as the mean of all outages that are longer than the current outage, you end up with a flat value that's around 8 hours, if I've done my maths right.

Not a statistician so I'm sure I've committed some statistical crimes there!

Unfortunately I can't find an easy way to upload images of the charts I've made right now, but you can tinker with my data:

    cause,outage_start,outage_duration,incident_duration
    Cell management system bug,2024-07-30T21:45:00.000000+0000,0.2861111111111111,1.4951388888888888
    Latent software defect,2023-06-13T18:49:00.000000+0000,0.08055555555555555,0.15833333333333333
    Automated scaling activity,2021-12-07T15:30:00.000000+0000,0.2861111111111111,0.3736111111111111
    Network device operating system bug,2021-09-01T22:30:00.000000+0000,0.2583333333333333,0.2583333333333333
    Thread count exceeded limit,2020-11-25T13:15:00.000000+0000,0.7138888888888889,0.7194444444444444
    Datacenter cooling system failure,2019-08-23T03:36:00.000000+0000,0.24583333333333332,0.24583333333333332
    Configuration error removed setting,2018-11-21T23:19:00.000000+0000,0.058333333333333334,0.058333333333333334
    Command input error,2017-02-28T17:37:00.000000+0000,0.17847222222222223,0.17847222222222223
    Utility power failure,2016-06-05T05:25:00.000000+0000,0.3993055555555555,0.3993055555555555
    Network disruption triggering bug,2015-09-20T09:19:00.000000+0000,0.20208333333333334,0.20208333333333334
    Transformer failure,2014-08-07T17:41:00.000000+0000,0.13055555555555556,3.4055555555555554
    Power loss to servers,2014-06-14T04:16:00.000000+0000,0.08333333333333333,0.17638888888888887
    Utility power loss,2013-12-18T06:05:00.000000+0000,0.07013888888888889,0.11388888888888889
    Maintenance process error,2012-12-24T20:24:00.000000+0000,0.8270833333333333,0.9868055555555555
    Memory leak in agent,2012-10-22T17:00:00.000000+0000,0.26041666666666663,0.4930555555555555
    Electrical storm causing failures,2012-06-30T02:24:00.000000+0000,0.20902777777777776,0.25416666666666665
    Network configuration change error,2011-04-21T07:47:00.000000+0000,1.4881944444444444,3.592361111111111

discuss

order

No comments yet.