
You Have a Disaster Recovery Plan. Does It Actually Work? How Business Leaders Can Tell the Difference
BCDR

Mahesh Chandran
CEO Dataring
You Have a Disaster Recovery Plan. Does It Actually Work?
Here's a question that separates theoretical preparedness from real preparedness: when was your organization's disaster recovery plan last tested with an actual failover, and what were the results?
If the honest answer is "I don't know" or "I'm not sure we've ever done that," you are not alone. You are, in fact, in the majority. Roughly 34% of companies never test their backups, backup restoration has a 50% failure rate in real incidents, and Forrester has observed that disaster recovery testing approaches are "largely unchanged since 2008" despite enormous changes in the underlying technology. The gap between having a plan and having a plan that works is one of the most underappreciated risks in modern business.
This post is an educational framework for business leaders who want to know whether their organization's continuity program is real or theatrical. It teaches a specific diagnostic you can use without technical expertise, explains what tabletop exercises are and why business leaders belong in the room, and gives you a practical template for the conversation to have with your IT team.
The confidence gap
Ask most executives whether their organization has a disaster recovery plan and you'll get a confident "yes." Ask whether the plan has been tested recently and you'll usually get a hesitant "I believe so." Ask what the test revealed and you'll often get silence.
This is the confidence gap: the measurable distance between what leaders believe about their preparedness and what their preparedness actually is. The gap exists everywhere but is especially pronounced in organizations where continuity is treated as a compliance checkbox rather than an operational capability. The plan exists in a document, the document satisfies the auditor, the auditor signs off, and everyone moves on. Nobody actually runs the plan to see if it works.
The data on this is grim. Beyond the 34% of companies that never test backups and the 50% restoration failure rate, Forrester found that 41% of organizations have never performed a full disaster recovery simulation. A plan that has never been executed under realistic conditions has an unknown success rate, and in practice, unknown success rates tend to be much lower than people expect. The untested plan is essentially a well-intentioned hope document.
Worse, the existence of an untested plan creates false confidence, which is more dangerous than having no plan at all. A business leader who knows there is no plan behaves cautiously, invests in backup processes, and pressures IT for improvements. A business leader who believes there is a plan — because somebody said so, somebody filed documents, somebody passed an audit — behaves as if the problem is solved. The behavioral difference is enormous, and it is entirely driven by whether the plan has been stress-tested against reality.
Three levels of false confidence
False confidence about disaster preparedness typically comes in three flavors. Each one has a specific tell, and each one can be diagnosed without technical expertise.
Level 1: "We have a plan"
The organization has a documented DR plan. It's a PDF somewhere in SharePoint, or a Confluence page, or a section in the employee handbook. It satisfies the auditors who come through every year. The tell: when you ask to see it, nobody knows where it lives; or when you find it, the contact list names people who left the company two years ago; or when you skim it, it references systems that were decommissioned eighteen months ago.
A plan that hasn't been reviewed against current operations in six months has significant gaps. A plan that hasn't been reviewed in eighteen months is essentially fiction. The diagnostic question is not "do we have a plan?" The diagnostic question is "when was the plan last reviewed against current operations, by whom, and what changes were made as a result?"
Level 2: "IT handles it"
The business leader has delegated all responsibility for continuity to the IT team. IT is technically competent and is genuinely trying to protect the organization, but IT is protecting infrastructure, not business processes. They know how to restore a SQL database, fail over a virtual machine, and reroute traffic between regions. They don't know which of your team's workflows are Tier 1, which can pause, or which customers must be contacted first during an outage.
This isn't IT's fault. They cannot know these things without being told, and business leaders who delegate continuity entirely to IT are withholding exactly the information IT needs. The result is a plan that optimizes for the metrics IT is measured on (time to restore systems) rather than the metrics the business actually cares about (time to resume critical business processes). These are not the same thing. The gap between "the database is back up" and "my team can actually do their job" is often several hours and sometimes several days.
Level 3: "We've backed everything up"
Backups exist. They run every night. They consume storage and budget. But they have never been tested for restoration. The organization assumes that backups automatically equal recovery, which is a category error: backups are necessary but not sufficient, and many backups silently fail to be restorable for reasons that only become visible when you actually try to restore them.
Industry research consistently finds that roughly 60% of data backups are incomplete or corrupted in ways that prevent successful restoration, and restoration has a 50% failure rate in real incidents. A backup that cannot be restored is not a backup. It is, at best, a wish. The organization that has never run a full restoration drill is protecting its data against the idealized threat model of "loss" without protecting it against the actual threat model of "loss plus imperfect backup pipeline plus restoration complexity plus time pressure."
Why plans decay
Even a well-written DR plan degrades over time through a process that resilience practitioners call plan decay. The plan becomes less accurate without anyone updating it, because the business itself keeps changing underneath the plan.
The forces that cause decay are mundane. Staff turnover: the person named as the incident coordinator left two months ago. New SaaS tools: your team adopted three new tools in the last quarter, none of which are in the plan. System migrations: IT moved the core application to a different cloud region but didn't update the DR runbook. Organizational restructuring: the department that used to own a critical process was dissolved and the process is now shared across two teams, neither of which thinks it's their responsibility. New regulatory requirements: a customer contract signed last quarter requires a 4-hour RTO for services currently protected at an 8-hour target.
Each of these changes is small. Each of them should trigger an update to the DR plan. Almost none of them do, because nobody's job is "keep the DR plan accurate" and the plan is only looked at once a year during the compliance cycle.
A useful diagnostic for plan decay is the newspaper test: if your organization's disaster recovery plan were published on the front page tomorrow, would it accurately describe how your organization would actually respond to a major incident? For most organizations the honest answer is no. The plan describes an idealized response that hasn't been true for at least a year.
What a tabletop exercise actually is
A tabletop exercise (TTX) is a structured discussion in which a team walks through a hypothetical disaster scenario, making decisions at each stage, without any technology actually being affected. It is the simplest, cheapest, and most effective form of disaster preparedness testing, and it is criminally underused.
Three misconceptions keep business leaders away from tabletop exercises.
The first misconception: "Tabletops are a technical exercise." They are not. Tabletops test decision-making, communication, and coordination — not system recovery. The participants who matter most are the people who make operational decisions under pressure, which usually means business unit leaders, not IT engineers. IT engineers are useful participants because they can answer technical questions, but they are not the main audience. The main audience is the people who have to decide which customers to call, which processes to prioritize, which decisions can wait and which cannot.
The second misconception: "I'll be exposed as technically ignorant." Many business leaders avoid tabletops because they expect to be asked technical questions they can't answer. But tabletops are not technical exams. The questions are about business judgment: "Which customers do we contact first?" "What do we tell the board?" "Which deadline can we slip?" "Who has authority to make this decision?" These are exactly the questions business leaders are paid to answer, and practicing them in a low-stakes environment is the whole point.
The third misconception: "Tabletops are optional if we have a real DR plan." They are not. The tabletop is the only place where you find out whether your plan survives contact with reality. It is where you discover that the escalation tree has gaps, that two teams think the same decision is someone else's job, that the process for notifying customers depends on a system that would be down during the exact scenario where you need to notify customers.
Your role in a tabletop exercise
If your organization runs tabletops, your job as a business unit leader is specific and well-defined. Here's what to do.
Before the exercise. Review your Minimum Viable Business Canvas if you have one — the exercise will test whether the priorities you set on paper survive under simulated pressure. Review the current RTOs and RPOs for the systems your team depends on (using the downtime economics framework). Come prepared with a clear understanding of your function's critical processes.
During the exercise. Your job at each stage is to answer three questions honestly: (1) What does my team do right now, given what we know? (2) Who do I need to communicate with, and what do I tell them? (3) What decisions am I authorized to make, and what decisions do I need to escalate?
Answer these questions out loud, in real time, as if the scenario were real. The goal is not to give perfect answers — it's to surface the places where you don't have good answers, because those are the gaps the exercise is designed to find.
After the exercise. The most valuable part of the tabletop happens after the scenario ends. Document every gap, confusion, and missing piece that came up. Share them with your team. Update your Dependency Chain Map (from the dependency mapping framework) if the exercise revealed unmapped dependencies. Schedule one follow-up to verify that each gap has been addressed.
A tabletop exercise that surfaces ten gaps and addresses none of them is worse than no tabletop at all — it creates documentation that looks like preparedness without any of the underlying work. A tabletop that surfaces ten gaps and addresses five of them in the following month is genuinely valuable.
Seven questions to diagnose your organization's readiness
You can diagnose your organization's real preparedness in about 20 minutes with seven questions. You don't need any IT knowledge to ask them. The answers tell you most of what you need to know.
1. When was our DR plan last updated? If the answer is more than twelve months, the plan is almost certainly inaccurate. If the answer is more than eighteen months, the plan is fiction.
2. Does the plan name specific people, and are those people still in those roles? Staff turnover breaks DR plans faster than any other force. A plan that names people who have left the company is a plan that can't execute.
3. Has the plan been tested with an actual restoration drill, and what were the results? "Yes, last year, and it uncovered these specific issues which we addressed" is a great answer. "I think so?" is a concerning one.
4. Does the plan cover the SaaS tools my team uses daily? Most organizational DR plans cover on-premise and cloud infrastructure but not SaaS. If your team runs on 30 SaaS tools and the plan mentions five, there are 25 gaps.
5. Have I (as a business leader) ever participated in a DR tabletop or drill? If not, the plan has never been tested against business priorities, only against technical infrastructure.
6. If our primary systems went down right now, does everyone on my team know what to do in the first hour? Ask your direct reports this question directly. Most will say no. That's useful information.
7. Do our customer contracts require us to have a tested DR plan, and can we show evidence of testing? Enterprise customers increasingly include this requirement in procurement. If you can't produce evidence, you have contractual exposure regardless of your technical preparedness.
The progressive testing ladder
There is a natural progression in resilience testing that takes organizations from theoretical preparedness to real readiness. Few organizations are at the top of the ladder, and most aren't aware that there even is a ladder.
Level 1: Plan review. A team reads the current plan together, identifies inaccuracies, and updates contact information and procedures. This is the minimum viable form of testing and should happen quarterly. It catches plan decay before it becomes dangerous.
Level 2: Tabletop exercise. A structured discussion-based walkthrough of a scenario, usually 2 to 4 hours, with business leaders and IT in the same room. This identifies decision-making gaps, coordination failures, and communication breakdowns. It should happen at least annually.
Level 3: Functional drill. A technical test of a specific recovery capability — actually restoring a backup, actually failing over a database, actually switching to a manual process. This validates that individual recovery procedures work. It should happen at least annually for each critical system.
Level 4: Full simulation. A realistic, multi-hour or multi-day exercise that combines scenarios and tests the full chain from detection to recovery, including business continuity alongside technical recovery. This is the gold standard. Few organizations reach this level, and the ones that do are substantially better prepared than their peers.
As a business leader, you should aim to participate in at least Level 2 annually and push your organization to conduct Level 3 testing for your function's most critical processes. The difference between an untested plan and a Level 3-tested plan is the difference between hoping and knowing.
The conversation to have with IT
Once you've worked through the diagnostic, schedule a conversation with your IT leader or CISO. Here's the structure.
Start by explaining what you're trying to understand: "I want to know how ready we actually are for the systems my team depends on. Can we walk through this together?" This is a collaborative framing, not an audit.
Ask to see the current DR plan for your team's top three systems. Note the last-updated date, the listed owners, the stated RTO/RPO targets, and the documented test history.
Ask the specific question: "When was the last time we actually tested recovery for these systems, and what did we learn?" If the answer is "we don't do that kind of testing" or "it's been a while," you've found the gap.
Ask whether you can participate in the next tabletop exercise for your function. If there are no scheduled tabletops, ask whether one can be scheduled in the next 90 days. Offer to help design the scenario based on your function's critical processes.
End with a specific commitment to follow up in 30 days to confirm progress. Not a vague "let's check in" — a calendar invite with a specific agenda. Business leaders who follow up drive change. Business leaders who don't follow up get the plan on paper and the failure in practice.
The bigger picture
The difference between a DR plan that exists and a DR plan that works is not subtle. It is the difference between the organizations that handle disruptions smoothly and the ones that descend into chaos during every incident. The gap is not closed by better documentation, more thorough audits, or larger IT budgets. It is closed by testing — by rehearsing the scenarios, finding the gaps, fixing them, and rehearsing again.
The business leader's role in this is not to become a DR expert. It is to refuse to accept "we have a plan" as a sufficient answer. It is to ask when the plan was tested, to participate in the testing when it happens, to push for more testing when it doesn't, and to close the feedback loop between what the plan says and what the organization can actually do.
If you want help running a tabletop exercise or assessing your organization's current testing posture, Dataring's BCDR consulting practice facilitates Level 2 through Level 4 exercises across the GCC. Get in touch to schedule a working session, and explore our other posts in this series: downtime economics, the Minimum Viable Business framework, and mapping SaaS and AI dependencies.




