top | item 39020090

(no title)

z64 | 2 years ago

Hi there, this is Zac from Kagi. I just posted some other details here that might be of interest:

https://news.ycombinator.com/item?id=39019936

TL;DR - we are a tiny, young team at the center, and everyone has a closet full of hats they wear. No dedicated SRE team yet.

> "what happens if a ton of searches happen?"

In fairness, you can checkout https://kagi.com/stats - "a lot of searches" is already happening, approaching 400k per day, and systems still operate with plenty of capacity day-to-day, in addition to some auto-scaling measures.

The devil is in the details of some users exploting a pathological case. Our lack of experience (now rightfully gained) is knowing what organic or pathological traffic we could have predicted and simulated ahead of time.

Load-simulating 20,000 users searching concurrently sounds like it would have been a sound experiment early on, and we did do some things resembling this. But considering this incident, it still would not have caught this issue. We have also had maybe 10 people run security scanners on our production services at this point that generated more traffic than this incident.

It is extremely difficult to balance this kind of development when we also have features to build, and clearly we could do with more of it! As mentioned in my other post, we are looking to expand the team in the near term so that we are not spread so thin on these sorts of efforts.

There is a lot that could be said in hindsight, but I hope that is a bit more transparent WRT how we ended up here.

discuss

order

smcleod|2 years ago

Zac, I think you’re doing great handling and communicating this. Keep up the great work and have fun learning while you’re at it!

rconti|2 years ago

What does "pathological" mean in this context?

fancy_pantser|2 years ago

being such to a degree that is extreme, excessive, or markedly abnormal (with a connotation of it happening on purpose)