(no title)
z64 | 2 years ago
https://news.ycombinator.com/item?id=39019936
TL;DR - we are a tiny, young team at the center, and everyone has a closet full of hats they wear. No dedicated SRE team yet.
> "what happens if a ton of searches happen?"
In fairness, you can checkout https://kagi.com/stats - "a lot of searches" is already happening, approaching 400k per day, and systems still operate with plenty of capacity day-to-day, in addition to some auto-scaling measures.
The devil is in the details of some users exploting a pathological case. Our lack of experience (now rightfully gained) is knowing what organic or pathological traffic we could have predicted and simulated ahead of time.
Load-simulating 20,000 users searching concurrently sounds like it would have been a sound experiment early on, and we did do some things resembling this. But considering this incident, it still would not have caught this issue. We have also had maybe 10 people run security scanners on our production services at this point that generated more traffic than this incident.
It is extremely difficult to balance this kind of development when we also have features to build, and clearly we could do with more of it! As mentioned in my other post, we are looking to expand the team in the near term so that we are not spread so thin on these sorts of efforts.
There is a lot that could be said in hindsight, but I hope that is a bit more transparent WRT how we ended up here.
smcleod|2 years ago
rconti|2 years ago
fancy_pantser|2 years ago