top | item 45810820

(no title)

Copying my response over from another comment:

I totally get that, but how hard would it be to actually make calls to your own API from the status page? If it fails, display a vague message saying there might be issues and that you are looking into it. Clearly these metrics and alerts exist internally too. I'm not asking for an instant RCA or confirmation of the scope of the outage. Just stop gaslighting me.

discuss

rozenmd|3 months ago

There are increasingly more status pages that automatically update based on uptime data (I built a service providing that - OnlineOrNot)

But early-stage startups typically have engineering own the status page, but as they grow, ownership usually transfers to customer support. These teams optimize for controlling the message rather than technical detail, which explains the shift toward vaguer/slower incident descriptions.

Yeri|3 months ago

Because you'd have a ton of downtime and they'd rather hide it if they could. :)

I used to work at a very big cloud service provider, and as the initial comment mentioned, we'd get a ton of escalations/alerts in a day, but the majority didn't necessarily warrant a status page update (only affecting X% of users, or not 'major' enough, or not having any visible public impact).

I don't really agree with that, but that was how it was. A manger would decide whether or not to update the status page, the wording was reviewed before being posted, etc. All that takes a lot of time.

swiftcoder|3 months ago

Not hard at all (our internal dashboards did just that). But to have that data posted publicly was not in the best interests of the business.

And honestly, having been on a few customer escalations where they threatened legal action over outages, one kind of starts to see things the business way...

dvt|3 months ago

> Just stop gaslighting me.

I heard this years ago from someone, but there's material impact to a company's bottom line if those pages get updated, so that's why someone fairly senior has to usually "approve" it. Obviously it's technically trivial, but if they acknowledge downtime (for example, like in the AWS case), investors will have questions, it might make quarterly reports, and it might impact stock price.

So it's not just a "status page," it's an indicator that could affect market sentiment, so there's a lot of pressure to leave everything "green" until there's no way to avoid it.

FinnKuhn|3 months ago

I feel like there should at least be some sort of disclaimer then that tells me the status page can take up to xx minutes to show an outage and not make it seem as if it is updated instantaniously. That way I could way those xx minutes before I file a ticket with support and not have the case thinking it is an isolated problem for me instead of a major outage.