It's a good mindset to have, but I think ssh access should still be available as a last resort on prod systems, and perhaps trigger some sort of postmortem process, with steps to detect the problem without ssh in the future. There is always going to be a bug, that you cannot reproduce outside of prod, that you cannot diagnose with just a core dump, and that is a show stopper. It's one thing to ignore a minor performance degradation, but if the problem corrupts your state you cannot ignore it.Moreover, if you are in the cloud, part of your infrastructure is not under your control, making it even harder to reproduce a problem.
I've worked with companies at Netflix's scale and they still have last-resort ssh access to their systems.
No comments yet.