Stop Chasing Five Nines: The Math Doesn't Add Up

Kevin Brown on Feb 11, 2023

6 min read

Availability ruler showing exponentially increasing gaps between 99%, 99.9%, 99.99%, and 99.999% with dollar signs and magnifying glass highlighting cost of five nines

We need five nines.

I’ve heard this in planning meetings more times than I can count. Five nines—99.999% availability — sounds impressive. Only 5 minutes of downtime per year. The kind of number you put in a pitch deck.

I once worked with a startup spending 40% of their infrastructure budget chasing 99.99% availability. Multi-region failover, global load balancing, a 24/7 on-call rotation burning out their small team. When I asked what they lost during downtime, the answer was about $2,000 per hour. They were spending $150,000 annually to save maybe $15,000 in downtime costs. Their competitors shipped features faster because they weren’t over-engineering infrastructure.

The worst part? Their users couldn’t tell the difference. The app was a B2B tool used primarily during business hours. Most “downtime” happened at 3am when nobody was logged in. And their payment gateway only offered 99.95%—they were chasing 99.99% on a system that could never exceed 99.95% due to a dependency they couldn’t control.

This is the trap: availability targeting becomes a badge of engineering honor rather than an economic decision. Let me show you the math that kills the five nines argument.

The Math That Kills Five Nines

Before diving into costs, let’s ground the discussion in concrete numbers.

Availability percentages translated to actual downtime.
Availability	Annual Downtime	Monthly Downtime
99% (two nines)	87.6 hours	7.31 hours
99.9% (three nines)	8.77 hours	43.83 minutes
99.99% (four nines)	52.60 minutes	4.38 minutes
99.999% (five nines)	5.26 minutes	26.30 seconds

Availability percentages translated to actual downtime.

The jump from 99.9% to 99.99% looks small — just 0.09 percentage points. But in downtime terms, you’re going from nearly 9 hours per year to under an hour. Each additional nine costs roughly 10x more than the previous one.

Composite Availability: The Hidden Killer

Here’s where it gets uncomfortable. Your system’s availability isn’t determined by your best component — it’s the product of all your components:

 $Math expression: A_{system} = A_1 \times A_2 \times A_3 \times ... \times A_n$

Three services at 99.9% each?

System availability = 0.999 × 0.999 × 0.999 = 0.997 (99.7%)

Three 99.9% components became one 99.7% system. You lost almost a full “nine” just by having dependencies.

This is why microservices architectures often have worse availability than monoliths unless carefully designed. Every network hop, every service call, every database query is another multiplicative factor dragging your availability down.

The good news: redundancy works in the opposite direction. When you have multiple components that can handle the same request, failures have to occur simultaneously to cause an outage. Two 99% servers in parallel? The math inverts — you multiply failure probabilities instead:

Availability = 1 - (0.01 × 0.01) = 99.99%

Two cheap servers achieved what one expensive server could not.

This is the fundamental insight behind all high-availability architectures: redundancy is cheaper than perfection. Two mediocre servers behind a load balancer beat one expensive server every time.

Your system availability cannot exceed your least available dependency. If your payment provider is 99.9%, your checkout flow cannot be 99.99% no matter how much you spend on your own infrastructure.

The ROI Calculation

Here’s what the 10x rule looks like in practice:

Infrastructure cost multipliers by availability tier.
Tier	Typical Architecture	Relative Cost
99%	Single region, basic monitoring	1x (baseline)
99.9%	Multi-AZ, automated recovery	2-3x
99.99%	Multi-region active-active	10-20x
99.999%	Exotic redundancy everywhere	50-100x+

Infrastructure cost multipliers by availability tier.

The jump from 99.9% to 99.99% isn’t just more servers — it’s fundamentally different complexity. You go from regional redundancy to global redundancy, introducing cross-region latency, data consistency challenges, and failure modes that don’t exist in simpler deployments.

newsletter.subscribe

But infrastructure is just the visible cost. Hidden costs are often larger: senior SREs instead of junior ops, 24/7 staffing, expensive APM tooling, and the opportunity cost of features not built. A sustainable 24/7 on-call rotation needs 4-5 engineers minimum — at $150k fully loaded cost each, that’s $600k-$750k annually just for humans.

When Does It Pay Off?

The formula is simple:

 $Math expression: \text{Investment justified when: } \text{Downtime Cost} \times \text{Hours Saved} > \text{Prevention Cost}$

Here’s a calculator that makes it concrete:

# Should we go from 99.9% to 99.99%?
hours_per_year = 8760
revenue_per_hour = 10_000  # $10k/hour lost during downtime
cost_to_achieve = 150_000  # $150k/year for multi-region

current_downtime = hours_per_year * (1 - 0.999)   # 8.77 hours
target_downtime = hours_per_year * (1 - 0.9999)   # 0.88 hours
hours_saved = current_downtime - target_downtime  # 7.88 hours

revenue_saved = hours_saved * revenue_per_hour    # $78,800
roi = (revenue_saved - cost_to_achieve) / cost_to_achieve  # -47%

# Result: Costs more than it saves!

ROI calculator showing when availability investments don’t pay off.

Even at $10,000 per hour of lost revenue — substantial for most companies — the jump from 99.9% to 99.99% doesn’t pay off. You’d spend $150k to save $78k. The crossover point where four nines makes sense is around $19,000 per hour of revenue at risk.

The biggest hidden cost is opportunity cost. Engineering hours spent achieving 99.99% are hours not spent building features that might grow revenue faster than the avoided downtime costs.

Most businesses don’t have that math. E-commerce sites doing $50 million annually average about $5,700 per hour — and downtime rarely loses 100% of that since customers often return later.

How to Push Back

When someone asks for five nines, don’t just nod along. You have the ammunition to have a real conversation.

Push back when:

1
Revenue doesn't justify it.
Show the ROI calculation. If downtime hours saved times revenue per hour is less than the cost, the math doesn't work.
2
Dependencies don't support it.
"We cannot exceed our payment provider's 99.95%" is a constraint, not an excuse. Show the composite availability math.
3
Better alternatives exist.
"We could improve from 99.9% to 99.95% for $70k, or add Feature X for $70k. Which creates more value?"

Reframe the conversation:

Instead of “We can’t do five nines,” try “Here’s what we can achieve at each investment level, and here’s the business impact of each option.” Present it as a menu with business-relevant tradeoffs:

Availability investment options as a menu.
Option	Availability	Annual Downtime	Total Cost
A	99.9%	8.77 hours	$50k/year
B	99.95%	4.38 hours	$120k/year
C	99.99%	52 minutes	$400k/year

Availability investment options as a menu.

Table: Present availability as investment options, not technical requirements.

The first framing sounds like an engineering limitation. The second sounds like strategic thinking.

That said, there are exceptions.

When Five Nines Actually Makes Sense

To be fair, there are domains where five nines isn’t overkill. Financial trading systems measure downtime in dollars per millisecond. Healthcare systems controlling medication dispensing can’t afford “we’ll retry in a few minutes.” Air traffic control, nuclear plant monitoring, emergency services dispatch — these systems have regulatory and safety requirements that change the ROI calculation entirely.

Free PDF Guide

Download the Availability Cost-Benefit Guide

Get the complete framework for evaluating availability targets with ROI math, dependency limits, and pragmatic investment tiers.

What you'll get:

Availability ROI calculation templates
Composite dependency math workbook
Tiered architecture cost models
Executive decision briefing framework

Free resource

Instant access

Download Now

Learn More

No credit card required.

The pattern: the cost of failure isn’t measured in lost revenue, it’s measured in lives, regulatory penalties, or market position that can never be recovered. If your system falls into this category, you already know it. If you’re not sure, it probably doesn’t.

The Right Answer for Most Services

For most SaaS products, 99.9% is the right target. It’s achievable with standard cloud tools, sustainable for normal-sized teams, and provides reliability that users perceive as “always works.” Going beyond requires deliberate justification — not engineering ego.

Users have tolerance thresholds. Nobody churns over 5 minutes of monthly downtime. They churn over slow pages and missing features. Spend your reliability budget where it creates the most value.

The goal is not maximum availability — it’s appropriate availability. Accept that some downtime is not just acceptable but economically rational.

That startup I mentioned? They eventually settled on 99.9% and shipped the backlogged features. Their users never noticed the difference — but they did notice the new capabilities.

Enjoyed the read? Share it with your network.

Table of Contents

Download the Availability Cost-Benefit Guide

Your Rate Limiter Is Your Biggest Outage Risk

Why Your Traces Are Unreadable: Span Design

Terraform Module Defaults That Won't Break Your Consumers

Why Your E2E Tests Are Flaky (And How to Fix Them)

How We Cut Preview Environment Costs by 60 Percent

Table of Contents

The Math That Kills Five Nines

Composite Availability: The Hidden Killer

The ROI Calculation

When Does It Pay Off?

How to Push Back

When Five Nines Actually Makes Sense

Download the Availability Cost-Benefit Guide

The Right Answer for Most Services

Share this article

Your Rate Limiter Is Your Biggest Outage Risk

Why Your Traces Are Unreadable: Span Design

Terraform Module Defaults That Won't Break Your Consumers

Why Your E2E Tests Are Flaky (And How to Fix Them)

How We Cut Preview Environment Costs by 60 Percent