/

BCDR

How to Choose a BCDR Pattern Without Overbuilding

BCDR

Mahesh Chandran

CEO Dataring

Most BCDR pattern conversations I've sat in on in the GCC start at the architecture layer: should we go active-active, hub-and-spoke, or multi-provider? That's the wrong starting question. The right starting question is which workloads actually justify each pattern, because the failure mode I see most often in DR planning isn't under-investment. It's spending Tier-0 budgets on Tier-2 systems while leaving the Tier-0 systems exposed.

This post is a short decision guide, not a deep dive. It assumes you've read our pillar guide on cloud disaster recovery in the GCC, which explains the three patterns in full: Hub-and-Spoke with Remote DR (Pattern A), Active-Active Multi-Region (Pattern B), and Multi-Provider Cross-Region (Pattern C). What follows is the decision framework I use with clients, condensed into something a leadership team can apply in an afternoon.

The thesis

Choosing a BCDR pattern is not primarily an architecture decision. It is a portfolio decision about which workloads get which level of resilience, made at the business layer and translated to the architecture layer afterwards. Organizations that get this right end up with a tiered strategy where one pattern is rarely the answer for everything.

A four-question diagnostic

Before discussing patterns, work through these four questions for each workload, or for each tier of workloads. The answers narrow your options before anyone draws a diagram.

1. How long can this workload be unavailable before the business consequence becomes unacceptable?

Be specific. "We can't be down" is not an answer. "We lose roughly $X per hour of downtime, and we face regulatory exposure if we're down for more than Y minutes" is an answer. If the answer is measured in minutes, you're in Pattern B or Pattern C territory. If it's measured in hours, Pattern A is usually sufficient. If it's measured in days, you probably don't need a multi-region pattern at all. Strong cross-region backup is enough.

2. How much data can you afford to lose?

This is the RPO question, framed in business terms. If the answer is "any transaction loss is a regulatory and customer-trust problem" (typical for payments, core banking, trading), you need synchronous replication and Pattern B or C. If the answer is "we can replay the last 15 minutes from upstream sources," asynchronous replication and Pattern A are fine. The most common gap I see is teams designing zero-RPO architectures for workloads where the upstream system already retains the data.

3. What is your annual budget for this workload's resilience?

Pattern A roughly doubles your storage and replication bandwidth costs but keeps compute minimal. Pattern B roughly doubles full infrastructure cost and adds latency-related performance overhead. Pattern C adds dual-platform engineering capacity on top of Pattern B costs. If your budget is constrained, Pattern A done well outperforms Pattern B done badly.

4. Do you have the engineering depth to operate this pattern in production?

This is the question that gets skipped most often. Pattern B requires synchronous database replication, global traffic management, and application-level multi-region awareness. Pattern C requires all of that plus proficiency in two cloud platforms, abstracted IAM, DNS and monitoring, and out-of-band orchestration. I have seen teams design Pattern C on paper before they had operationalized Pattern A. The result is usually a pattern that looks resilient in the architecture diagram and fails the first time it is tested under load.

The Pattern–Tier Matrix

Once you've worked through the four questions, the recommended pattern usually falls out cleanly. The matrix below is the one I use to anchor the conversation:

Tier 0 — downtime tolerance of seconds to minutes. Payment gateways, core banking, trading platforms, emergency citizen services. Recommended pattern: Pattern B (Active-Active).

Tier 0 sovereign — downtime tolerance of minutes to ~1 hour. Critical national infrastructure with a regulatory or contractual multi-provider mandate. Recommended pattern: Pattern C (Multi-Provider).

Tier 1 — downtime tolerance of 1 to 4 hours. Core business applications, customer portals, ERP, CRM. Recommended pattern: Pattern A (Hub-and-Spoke).

Tier 2 — downtime tolerance of 4 to 24 hours. Internal reporting, BI, logistics back-office. Recommended pattern: Pattern A or strong cross-region backup.

Tier 3 — downtime tolerance of 24+ hours. Dev and test, internal tools, knowledge bases. Recommended pattern: cross-region backup.

Two things worth noting. First, very few organizations need Pattern C, and most that think they need it are better served by Pattern B with a documented secondary-provider fallback for the most critical subset. Second, Tier 2 and Tier 3 are where I see the most overbuilding. These workloads frequently get DR architectures that cost more than the workload they're protecting.

Three things from the comparison worth surfacing

The full breakdown across RTO, RPO, cost, complexity, regulatory alignment, and provider dependency is in the pillar guide. Three points from that comparison are worth surfacing here, because they're where I most often see decisions go wrong.

Pattern B does not reduce single-provider risk.

If your primary cloud provider's global control plane is degraded, both regions of a Pattern B deployment can become unreachable at the same time. Pattern B handles regional infrastructure failure. It does not handle provider-wide failure. Organizations that need to survive provider-wide failure need Pattern C, or at minimum a documented manual fallback to a secondary provider for the most critical workloads.

Pattern A's cost advantage depends on disciplined warm-standby management.

If your DR region's compute footprint creeps upward over time, because someone leaves a test environment running, or a team starts using DR capacity for analytics, the cost gap with Pattern B narrows quickly. Pattern A is cheap when it is actually warm-standby. When it drifts toward partial-active, you are paying close to Pattern B costs without Pattern B's failover characteristics.

Pattern C's hardest problem is data, not compute.

Cross-provider compute failover is well-trodden ground. Cross-provider data synchronization at meaningful scale is where engineering teams spend most of their time. If your Pattern C plan glosses over the data layer, that is the part that will fail in a real test.

A decision sequence I'd actually use

For most GCC organizations sitting down to choose a pattern for the first time, the sequence I would recommend is:

Step 1. Run a Business Impact Analysis. Classify every workload into Tier 0 through Tier 3 based on revenue impact, regulatory exposure, and operational dependency. Get this signed off by the business, not just IT.

Step 2. Map each tier to a recommended pattern using the matrix above. Resist the urge to upgrade everything to the most resilient option.

Step 3. Cost the recommended patterns at the tier level. If the total is unrealistic, you don't have a pattern problem. You have a tier-classification problem. Re-examine which workloads really belong in each tier.

Step 4. Pick the highest-priority Tier 0 system and design Pattern B (or C) for it as your reference implementation. Get one Tier 0 right before scaling.

Step 5. For Tier 1, design one Pattern A reference implementation. Once Tier 0 and Tier 1 references are working and tested, scale them across the rest of the portfolio.

The mistake I see most often is organizations trying to design DR for their entire portfolio in one go. The portfolio approach gets stuck at architecture review, often for months. The reference-implementation approach gets one workload protected within a quarter, which is more useful than a perfect plan that hasn't shipped.

Tradeoffs and honest limitations

A few things this framework does not solve, that are worth flagging explicitly:

Data residency constraints can override the recommended pattern. If your regulator requires data to remain within a specific jurisdiction, Pattern A and Pattern B may be impossible without a pre-negotiated exception framework. That is a regulatory engagement to start now, not a technical problem to solve later. For financial services, the relevant context is in our SAMA CSF guide.

The matrix assumes a single primary region. If you already operate from multiple regions (for example, one in the UAE and one in KSA for residency reasons), the pattern conversation looks different. The relevant question becomes whether the existing regions provide adequate geographic separation, and whether they share dependencies that would fail together.

Engineering maturity matters more than budget. I have seen well-funded Pattern B deployments fail in test because the team that designed them moved on, and the operating team didn't have the muscle memory to run a real failover. Whichever pattern you pick, plan for the operating team's capacity, not just the build team's ambition.

A practical takeaway

If you're sitting down with this question for the first time, the most useful 90 minutes you can spend is not reading more architecture material. It's running a Business Impact Analysis with your business leads and producing a tier classification. The pattern decision usually becomes obvious once the tiers are clean. If your tiers are unclear, no amount of architecture analysis will give you a defensible answer.

For the full technical depth on each pattern, see our cloud DR in the GCC guide. For the regulatory context driving these decisions in financial services, see our SAMA CSF guide. If you'd like help running a BIA or building a tier-to-pattern map for your portfolio, Dataring's resilience practice does this engagement regularly. Get in touch.