The scenario described would indicate that there were delta's between the testing environment (where the DR strategy was tested), and the production environment. It was probably related to OS updates/changes being applied over time, resulting in a configuration that changes. I've always found it good practice to build and maintain a staging environment that mimics the production environment in all aspects. When configuration changes (aka security patch) are needed, they are tested and validated on the staging environment before they are deployed on the production environment. This gives you an opportunity to test and validate the results in a non-production (aka no rules, no SLA's) environment. Part of this involves validating that procedures such as DR will continue to work as expected on the new configuration, before it gets rolled out to production. From my experience, this methodology minimizes scenarios of unexpected behavior in the production environment (aka downtime). I would recommend this methodology (or anything similar) regardless of the OS/distribution you're using.
allegory|11 years ago
The DR environment was the staging environment for the patches. Periodically the production kit would be block copied back to the DR environment and sysprepped.
Every step to prevent differences between the two clustered environments were taken.