top | item 39020201

(no title)

muhammadusman | 2 years ago

I was one of the users that went and reported this issue on Discord. I love Kagi but I was a bit disappointed to see that their status page showed everything was up and running. I think that made me a bit uneasy and it shows their status pages are not given priority during incidents that are affecting real users. I hope in the future the status page is accurately updated.

In the past, services I heavily rely on (e.g. Github), have updated their status pages immediately and this allows me to rest assured that people are aware of the issue and it's not an issue with my devices. When this happened with Kagi, I was looking up the nearest grocery stores open since we were getting snow later that day so it was almost like I got let down b/c I had to go to Google for this.

I will continue using Kagi b/c 99.9% of the other time I've used it, it has been better than Google but I hope the authors of the post-mortem do mean it when they say they'll be moving their status page code to a different service/platform.

And thanks again Zac for being transparent and writing this up. This is part of good engineering!

discuss

order

Terretta|2 years ago

> In the past, services I heavily rely on (e.g. Github), have updated their status pages immediately

Also in the past, other times GitHub has not updated its status page immediately.

phyzome|2 years ago

As an engineer on call, I have been in this conversation so many times:

"Hey, should we go red?" "I don't know, are we sure it's an outage, or just a metrics issue?" "How many users are affected again?" "I can check, but I'm trying to read stack traces right now." "Look, can we just report the issue?" "Not sure which services to list in the outage"

...and so on. Basically, putting anything up on the status page is a conversation, and the conversation consumes engineer time and attention, and that's more time before the incident is resolved. You have to balance communication and actually fixing the damn thing, and it's not always clear what the right balance is.

If you have enough people, you can have a Technical Incident Manager handle the comms and you can throw additional engineers at the communications side of it, but that's not always possible. (Some systems are niche, underdocumented, underinstrumented, etc.)

My personal preference? Throw up a big vague "we're investigating a possible problem" at the first sign of trouble, and then fill in details (or retract it) at leisure. But none of the companies I've worked at like that idea, so... [shrug]

Gareth321|2 years ago

This is exactly why those status pages are almost always a lie. Either they need to be fully automated without some middle manager hemming and hawing, or they shouldn’t be there at all. From a customer’s perspective, I’ve been burned so many times on those status pages that I ignore them completely. I just assume they’re a lie. So I’ll contact support straight away - the very thing these status pages were intended to mitigate.

virtue3|2 years ago

I think your bit at the end is the most important.

ANY communication is better than no communication "everything is fine, it must be you" is the worst feeling in these cases. Especially if your business is reliant on said service and you can't figure out why you are borked (eg the github ones).

smsm42|2 years ago

IMHO, any significant growth in 500s (that's what I was getting during the outage) warrants mention on status page. I've seen a lot of stuff, so if I see an acknowledged outage, I'll just wait for people to do their jobs. Stuff happens. If I see unacknowledged one, I get worried that people who need to know don't and that undermines my confidence in the whole setup. I'd never complain if status page says maybe there's a problem but I don't see one. I will complain in the opposite case.

PeterStuer|2 years ago

And that is before 'going red' has ties to performance metrics with SLA impacts ...

TeeWEE|2 years ago

Connect your status page to actual metrics and decide a treshold for downtime. Boom you’re done.

lambdaba|2 years ago

I'm only replying to the praise here - I too, although I haven't fully switched, had a very enticing moment with Kagi when it returned a result that couldn't even be found on Google at any page in the results. This really sold me on Kagi and I've been going back and forth with some queries, but I have to say that between LLMs, Perplexity, and Google often answering my queries right on the search page, I just don't have that many queries left for Kagi.

If Kagi would somehow merge with Perplexity, now that would be something.

spdif899|2 years ago

Kagi does offer AI features in their higher subscription tier, including summary, research assistance, and a couple others. Plus I think they have basically a frontend for GPT-4 that uses their search engine for browsing, and they just added vision support to it today.

I don't subscribe to those features or any AI tool yet, just pointing out there could be a version of Kagi that is able to replace your Chatgpt sub and save you money

herpdyderp|2 years ago

I envy your experiences with other services. I've never seen any service's status page show downtime when or even soon after I start experiencing it. Often they simply never show it at all.

Neikius|2 years ago

Microsoft is notorious for their lax status page updates...

wiml|2 years ago

Is there anyone who isn't?

NetOpWibby|2 years ago

It's worth noting that the status page software they use doesn't auto-update automatically.

> Please note that with all that cState can do, it cannot do automatic monitoring out of the box.

https://github.com/cstate/cstate

ParetoOptimal|2 years ago

I guess a status page that doesn't auto-update is good for PR, but it's not very useful to show... you know... the status.