Disaster Recovery Testing Is An Imperfect Strategy
So you’ve been convinced of the importance of conducting a disaster recovery test. You’ve done one, sometime within the last year. You got your applications up and met your recovery objectives, and you updated your plan to correct any errors.
That means everything’s good, and if there happens to be a disaster in the next 12 months, you’ll just smoothly recover, no problem, right? Unfortunately, that’s not the case. It’s important to do a disaster recovery test, but it’s also important to realize that a test is a simulation, and simulations never match reality completely.
Testing Isn’t a Perfect Model of Disaster
If your test process is a tabletop readthrough and walkthrough of the recovery process, that’s better than nothing, but it isn’t a real test of the recovery process. Remember, computers are very literal, but people aren’t. Make a typo or miss a step in the process, and the people reading through the test plan may not even notice. The computer definitely will, though.
It’s better to actually run through the recovery process and let the computer find any errors in the plan, but there are still factors that make this different from executing the plan during an actual disaster:
- The DR test is scheduled. The test is scheduled, planned, and prepped for. It isn’t a surprise, so people have found and reviewed the documents and aren’t scrambling to figure out what they’re supposed to do. The people you need to perform disaster recovery are available—not out on vacation, not having quit their job a week ago. Many companies prepare for a DR test by checking out their DR site and making sure systems are all on the same versions as production; when an unexpected disaster happens, whatever happens to be installed at the DR site is what you’ve got to work with.
- The DR plan is up to date. You’re testing the latest version of the DR plan, and it’s no doubt been updated prior to this year’s test. When a disaster happens six months from now, systems will have been brought online, and other systems will have been retired. Some systems will have had versions upgraded. Systems will have had new dependencies introduced. A solid change control process can attempt to make sure DR plans are updated when these changes are made, but the plan probably won’t completely reflect the reality.
- There’s no impact on business. DR tests are carefully scheduled and managed to make sure they don’t impact the business. That’s fine for a test, but during a real disaster, when business users are unable to get work done, you’re likely to feel intense pressure to work faster to get systems back up. Of course, doing things fast increases the chances of doing them wrong. It’s also likely that, during a simulated test, there won’t be any partially completed transactions that need to be recovered to prevent lost business.
- Test scope may not match disaster scope. Not all disasters require an all-hands-on-deck approach to recovery. If you test recovering from a small disaster, you won’t have rehearsed recovering from a big one. If you focus only on the big recoveries, you may not have a simpler plan for handling smaller crises.
None of these issues mean you shouldn’t do a DR test. They just mean you can’t rely on a test in January preparing you to handle a disaster in November. Keep your test plan up to date throughout the year and consider doing a DR test quarterly.
Prescient Solutions designs and implements disaster recovery solutions that prepare you to survive an IT crisis. Contact us to learn about developing, testing, and executing a successful disaster recovery strategy.