(no title)
Matthias247 | 3 months ago
The report actually seems to confirm this - it was indeed a crash on ingesting the bad config. However I'm actually surprised that the long duration didn't come from "it takes a long time to restart the fleet manually" or "tooling to restart the fleet was bad".
The problem mostly seems to have been "we didn't knew whats going on". Some look into the proxy logs would hopefully have shown the stacktrace/unwrap, and metrics about the incoming requests would hopefully have shown that there's no abnormal amount of requests coming in.
No comments yet.