This seems like a pretty common response for a breaking incident for a scale app. Requests flow through to a failing system and trigger HTTP 500. Those requests may pachinko through the stack, making a variety of calls that can compound the degradation of a system weathering an unplanned failure state.
Engineers stop the bleeding by 503'ing requests at the perimeter or putting up a static maintenance page. This allows things like caches or DBs or app servers to cool off while a rollback or a revert goes out. Then, when the system is stable again, let requests flow through again (slowly, of course).
Woofles|8 years ago
beager|8 years ago
Engineers stop the bleeding by 503'ing requests at the perimeter or putting up a static maintenance page. This allows things like caches or DBs or app servers to cool off while a rollback or a revert goes out. Then, when the system is stable again, let requests flow through again (slowly, of course).