/

BCDR

The Three-Way Gap: Vendor SLAs, Customer Commitments, and Cyber Insurance

BCDR

Mahesh Chandran

CEO Dataring

A pattern I've seen repeatedly in BCDR retrospectives: an organization suffers a multi-hour outage of a critical SaaS platform, loses a substantial amount of revenue and incurs remediation costs, files an SLA claim with the vendor, and recovers a small fraction of a percent of the actual loss in service credits. The remainder is absorbed by the organization. This is not unusual. It is the normal way SLAs work, and the normal moment of discovery is during a crisis when the lesson is most expensive.

The gap between what vendor contracts promise and what actual outages cost is one of the most under-examined risks in modern operations. It sits squarely in the domain of business unit leaders — the people who sign vendor contracts, commit to customer SLAs, and own the customer relationships that suffer when things break.

This post is about the framework I use to read vendor contracts, customer SLAs, and cyber insurance with continuity in mind. For the broader BCDR architecture context, see our cloud DR in the GCC pillar. For the downtime economics that sits upstream of this work, see our RTO/RPO post.

The thesis

Three documents describe what should happen when systems fail: the vendor's SLA, your contracts with customers, and your cyber insurance policy. Each document defines "downtime" and "loss" differently. Each one was negotiated separately, by different teams, often years apart. The space between the three definitions is where uninsured, uncompensated risk lives. Most organizations have never measured the gap. The ones that have, plan differently.

The dark matter of downtime economics

In the downtime economics post, I named regulatory and contractual penalties as the dark-matter component of the impact equation — often the largest single line item, rarely measured in advance. Contractual exposure comes from three sources simultaneously, and each sits in a different corner of the organization. Understanding how the three interact is the core of what I'd call continuity literacy.

The Three-Way Gap

Every organization operates inside a triangle of contracts.

The vendor corner. You depend on vendors — SaaS tools, cloud providers, payment processors, AI services. Each has an SLA promising a service level and defining the consequence if they miss it. Vendor SLAs typically commit to service credits, not damages. If a vendor outage causes you significant losses, the vendor's liability is usually capped at a small percentage of fees you paid during the affected period. This is structural in most commercial vendor contracts and rarely negotiable below the enterprise tier.

The customer corner. You sell services to customers, and those customers increasingly require contractual uptime commitments. Enterprise customers often demand 99.9% uptime with financial penalties for breach, sometimes with escalating penalties for repeated outages and termination rights. Your commitments to customers are frequently stricter than what your vendors commit to you.

The insurance corner. You carry some combination of cyber insurance, errors-and-omissions coverage, and business interruption insurance. Each has triggers and exclusions that determine whether a particular incident is covered. Most organizations assume their cyber insurance covers more than it actually does.

The three corners form the Three-Way Gap: the space between what your vendors promise you, what you promise customers, and what insurance will cover when both promises are broken. Every organization has one. Most have never measured it. The gap is where uncompensated risk lives.

Decoding vendor SLAs

Most vendor SLAs share a common structure: an uptime commitment, a measurement methodology, a list of exclusions, and a remedy. Each component tends to favor the vendor in ways that aren't obvious until you read carefully.

The uptime commitment is the headline number: 99.9% or 99.99% or just 99%. The math matters: 99.9% allows roughly 8.77 hours of annual downtime, 99% allows 87.6 hours — ten times more. That difference is often the difference between tolerable and catastrophic. Check the exact number, not the marketing language.

The measurement methodology determines what counts as downtime. Per calendar month or rolling window? Does degraded performance (the service is technically up but unusably slow) count as downtime, or only full outages? From the vendor's internal monitoring or from customer-visible endpoints? The answers determine whether outages you experience will count toward the SLA at all. In the post-mortems I've reviewed, customers often discover that the outages they experienced don't meet the contractual definition of downtime.

The exclusions are usually where most of the real risk hides. Common exclusions: scheduled maintenance windows (planned downtime doesn't count, even if it affects you), force majeure (natural disasters, wars, government actions — increasingly relevant in the GCC), customer-caused issues (if the vendor argues your actions contributed, the SLA may not apply), and third-party failures (the vendor's own upstream provider goes down, vendor often disclaims responsibility).

The remedy structure is usually service credits: a percentage of monthly fees refunded against future billing. Credits are designed to preserve the vendor relationship, not to compensate for losses. The structural ratio between credit value and business impact is severe — often well under 1% of actual loss for a mid-sized outage. This isn't a mistake or bad-faith negotiation; it's how SLAs are written by default.

The key insight: SLAs protect the vendor, not the customer. They define the minimum the vendor commits to, not the maximum protection the customer receives. A leader who treats an SLA as insurance is making a category error.

Customer-side commitments

If vendor SLAs protect vendors, customer SLAs protect your customers. This is the side of the triangle where business leaders have the most leverage, because customer SLAs are usually written by someone in your organization — often you or your team.

The risk on this side: the promises you're making to customers don't match the protections you're receiving from your vendors. If you commit to 99.9% uptime with your top 20 enterprise accounts, and the vendor you rely on for the supporting service commits to only 99.5%, you're absorbing the difference — roughly 35 hours of allowable vendor downtime per year that you've contractually promised away. Those hours are your liability, not the vendor's.

The honest question: for each customer commitment your team has made, can your current technical infrastructure and vendor relationships actually deliver? In most assessments I've done, the answer is no for at least some customer commitments — typically those signed by sales without operations or legal input. The most common moment of discovery is when a customer files a claim after an outage and your legal team has to figure out whether you're liable.

Enterprise customers are increasingly sophisticated about this. They send detailed third-party risk questionnaires as part of procurement, asking for evidence of tested DR plans, board-approved continuity programs, and specific RTO/RPO targets. These questionnaires are not theater. The answers are reviewed by the customer's risk team, who make procurement decisions based on what you disclose. Vendor incidents traced to inadequate continuity practices are a meaningful procurement concern, which is why the questionnaires keep getting longer.

Cyber insurance and the BCDR connection

The third corner is insurance, specifically cyber insurance, which has become mandatory or near-mandatory in much of the GCC financial sector. SAMA, QCB, and other GCC regulators have introduced or strengthened cyber insurance expectations for financial institutions in recent years. Specific minimum coverage levels and effective dates vary; institutions should verify current requirements with their supervising regulator rather than relying on summaries.

Two realities about how cyber insurance actually works surprise most leaders.

A meaningful share of cyber insurance claims are denied or reduced. Industry reporting consistently shows that claims are frequently denied because the policyholder couldn't demonstrate required security controls at the time of the incident. The most common reasons: missing multi-factor authentication, inadequate backup and disaster recovery, and weak vendor management. A policy that would have paid out had MFA been deployed will not pay out if MFA was missing, regardless of whether MFA had anything to do with the specific incident. Specific denial-rate figures vary by source and year; the pattern is consistent.

Premiums are materially lower for organizations that can demonstrate BCDR maturity. Underwriters ask specific questions about business continuity: documented BCP, testing cadence, leadership review of test results, evidence. Organizations that can answer yes substantively typically pay lower premiums than those that can't, for the same coverage level. The financial value of mature BCDR practices appears on the insurance invoice.

Business unit leaders contribute in two ways. The tested MVB plan for your function (see our MVB post) is exactly the evidence underwriters ask for. Your participation in tabletop exercises (see our DR testing post) is documentable proof of leadership engagement with continuity risk. Both contribute directly to coverage and premium.

A GCC-specific caveat: most cyber insurance and standard business interruption policies exclude acts of war and state-level conflict. Cloud providers have invoked force majeure during regional disruptions, limiting their own SLA liability exactly when customers most needed protection. Organizations operating in the GCC should understand whether their insurance covers scenarios specific to the region — kinetic incidents affecting infrastructure, regional conflict, cross-border data restrictions during emergencies — and negotiate explicitly if it doesn't. The standard policy is not written for the GCC context.

What to verify before signing any vendor contract

The best time to close the Three-Way Gap is before signing. After the contract is in place, leverage drops sharply. Before signing any contract with a vendor your team will depend on:

What is the exact uptime commitment, measured how? Get the number, the measurement window, and the methodology in writing. Verify it matches your customer commitments.

What are the exclusions? Read carefully. Ask the vendor to walk through specific scenarios: what happens if their cloud provider goes down? During scheduled maintenance that exceeds its window? During force majeure?

What is the remedy, and is there a liability cap? Most caps are measured in months of fees, not in business impact. Understand the cap and whether it's negotiable.

What is the incident notification timeline? How quickly must the vendor tell you about an outage? Many have obligations to notify within hours; some have no contractual timeline at all. For regulated entities with their own notification obligations, vendor notifications need to be faster than the regulator's timeline.

What are the data export procedures at termination? How many days do you have? In what format? Is there a charge? Vendors sometimes impose extraction fees that make migration prohibitive — a soft lock-in that matters during continuity planning.

Does the vendor carry its own insurance, and can they share the certificate? If a vendor's failure causes you damage, your ability to recover depends partly on whether the vendor is insured.

What are the subprocessor disclosure and consent requirements? Many SaaS tools depend on other SaaS tools. Subprocessors are fourth-party risk you may not see. Good contracts require disclosure and consent before changes.

What is the change-of-control clause? If the vendor is acquired or sold, what happens to your data, your service, your pricing? Many organizations lose critical services during vendor acquisitions because no one negotiated this.

Risk questionnaires flow both ways

Risk questionnaires move in both directions. Your team receives them from customers; your team sends them to vendors. In both directions, continuity literacy matters.

When you receive a questionnaire from a customer, it's typically handed to IT or security to fill out. This is a mistake. IT can answer technical questions but can't substantively answer the continuity questions — RTOs, RPOs, documented BCPs, test frequency, board governance — without business leader input. Filled out by someone who doesn't know what your function does, the answers will be either too conservative (losing deals) or too optimistic (creating exposure). Neither is acceptable.

When you send a questionnaire to a vendor, the same rule applies in reverse. Don't accept written answers as definitive. Ask follow-up questions on anything vague, and ask for evidence — recent test reports, sample incident response documentation, proof of insurance. Serious vendors provide it. Vendors that can't or won't are a flag.

Five quarterly actions for a business unit leader

1. Review your top three vendor contracts. Pull the SLA terms, data export provisions, and incident notification timelines for the three vendors your team depends on most. If any are unacceptable, initiate a renegotiation conversation. Many vendors will improve terms for existing customers who ask. Almost none do without being asked.

2. Ask legal or procurement about your customer-side commitments. Specifically: do any of your customer contracts include continuity commitments your organization may not currently be able to meet? The answer is often yes for at least some accounts. Information you need to have.

3. Contribute to the next vendor risk questionnaire response. When the next enterprise customer sends one, ask to review the draft response before it goes out. Focus on the continuity section. Make sure the answers are accurate and the evidence cited matches what your function does.

4. Ask your CFO or risk manager about cyber insurance. What does it cover? What does it exclude? What does the underwriter require? How does your function's BCDR maturity affect the premium? A 30-minute conversation that frequently surfaces both unknown risks and reducible costs.

5. Create a Continuity Card for your function. One page: critical vendors, their SLA terms, your customer commitments, the gap between them, and the mitigations you have in place. Update quarterly. Share with IT and legal. The card becomes the authoritative source for answering continuity questions and makes every subsequent conversation — with IT, customers, insurers, auditors — faster and more productive.

Tradeoffs and honest limitations

The gap is structural and partial closure is realistic. No amount of work eliminates the Three-Way Gap entirely. Some vendor liability will always be capped below your actual loss, some customer commitments will be tighter than your vendor protections, and some classes of incident will sit outside insurance coverage. The goal is to see the gap clearly, not to close it absolutely.

Negotiation leverage varies sharply with vendor size. Hyperscale cloud providers and large SaaS vendors negotiate from a position of strength; smaller vendors are often more flexible. Plan your negotiation effort accordingly: focus on the vendors where you have leverage, and accept that for some critical vendors the contract is the contract.

Insurance markets are dynamic. Premiums, exclusions, and underwriter requirements change. Annual review with your broker is the minimum cadence; quarterly check-ins are more useful in regions with active threat environments.

A practical takeaway

If your function has not measured its Three-Way Gap, the highest-leverage 30-day project is filling out a single Continuity Card for your most critical vendor. One page. Vendor SLA terms, your customer commitments that depend on this vendor, the gap between them, and the mitigations in place. The exercise will surface the gap. Once the first card exists, the others scale at low marginal cost.

For the framework that sets RTO/RPO inputs, see our downtime economics post. For the prioritization work upstream, see our MVB post. For the dependency-mapping layer, see our SaaS and AI dependency post. For the testing layer, see our DR testing post. For the broader regional context, see our cloud DR in the GCC pillar. If you'd like outside support reviewing your vendor contracts, customer SLA exposure, or insurance posture, Dataring's resilience practice works with leaders across the GCC. Get in touch.