At some point, your website will go down. It's not a question of if—it's when. Hardware fails. Software has bugs. DNS propagates incorrectly. Certificates expire. Databases crash.
The difference between a minor hiccup and a major disaster isn't whether downtime happens. It's how fast you detect it, how well you respond, and how effectively you prevent it from recurring.
This guide covers everything about website downtime: understanding it, surviving it, and minimizing it.
Dealing with an outage right now? Skip to what to do when it happens or create an emergency status page instantly.
What is Website Downtime?
Website downtime is any period when your site or service is unavailable or unusable for visitors. This includes:
- Complete outage: Server returns errors (500, 502, 503) or doesn't respond at all
- Partial outage: Some pages work, others don't. Homepage loads but checkout is broken
- Functional downtime: Pages load but functionality is broken—can't log in, can't submit forms, can't complete purchases
- Performance degradation: Site loads but is so slow that it's effectively unusable
The last two are often worse than complete outages because they're harder to detect. A server returning 500 errors triggers monitoring alerts. A checkout that silently fails might not.
Why even "up" isn't always enough: Why uptime alone isn't enough.
Types of Downtime
Planned Downtime
Scheduled maintenance, deployments, migrations. You know it's coming and can prepare: notify users, schedule during low-traffic hours, have a rollback plan.
Planned downtime is increasingly avoidable with zero-downtime deployments, but some operations (major database migrations, infrastructure changes) still require it.
Unplanned Downtime
The bad kind. Something breaks unexpectedly: server crash, bug in a deploy, DDoS attack, expired certificate, DNS issue. No warning, no preparation.
Partial Downtime
Only some users or some features are affected. A CDN node fails in Asia, your payment processor has an outage, a single microservice crashes. Often harder to detect than full outages.
Common Causes of Website Downtime
Understanding why sites go down helps you prevent it. Here are the most common causes:
Server and Infrastructure
- Hardware failure: Disk crashes, memory failures, network cards dying
- Resource exhaustion: Out of memory, disk full, CPU maxed out
- Cloud provider outage: AWS, Google Cloud, Azure having a bad day
Software and Code
- Bad deployment: A code push that introduces a crash or critical bug
- Database issues: Slow queries, connection pool exhaustion, deadlocks
- Memory leaks: Application gradually consumes all available memory until it crashes
Network and DNS
- DNS misconfiguration: Wrong records, propagation issues, expired domain
- SSL certificate expiry: Certificate expires, browsers block access
- CDN failure: Content delivery network has a regional or global outage
External Factors
- DDoS attacks: Flood of traffic overwhelms your infrastructure
- Third-party service failure: Payment processor, auth provider, or API dependency goes down
- Traffic spike: Viral content or launch event exceeds server capacity
Human Error
- Configuration mistakes: Wrong environment variables, deleted database, misconfigured firewall
- Forgot to renew: Domain, certificate, or hosting payment lapsed
The Real Cost of Downtime
Direct Revenue Loss
When your site is down, you can't sell. The formula is simple: hourly revenue x hours of downtime = lost sales.
Calculate yours: Downtime cost calculator.
Indirect Costs
Direct revenue loss is only the beginning:
- SEO impact: Extended downtime causes de-indexing. Recovery takes weeks.
- Customer trust: Users who experience outages are less likely to return or recommend you
- Support costs: Every minute of downtime generates support tickets and social media complaints
- Lifetime value: A churned customer doesn't just cost one sale—you lose their entire future spend
- Team productivity: Incident response pulls your team away from building
Scale Matters
| Monthly Revenue | Cost per Hour | Cost per Day |
|---|---|---|
| $1,000 | $1.37 | $33 |
| $10,000 | $13.70 | $329 |
| $100,000 | $137 | $3,288 |
| $1,000,000 | $1,370 | $32,877 |
Remember: actual costs including reputation damage are typically 2-3x the direct revenue loss.
What to Do When Your Site Goes Down
Your monitoring alert just fired. Your site is down. Here's what to do in order:
Verify the outage (1 minute)
Check from a different network/device. Is it really down or is it your connection? Check your monitoring dashboard for confirmation.
Quick uptime check →Communicate immediately (2 minutes)
Update your status page. Post on social media if appropriate. Don't wait until you know the cause—acknowledge the issue first.
Create emergency status page →Identify the cause (5-30 minutes)
Check server logs, error rates, recent deployments, infrastructure status, third-party service status pages.
Fix or mitigate (varies)
Roll back the last deployment. Restart the service. Scale up resources. Route around the failure. Fix the bug.
Verify recovery (5 minutes)
Confirm the site is back from multiple locations and browsers. Check that all critical functionality works, not just the homepage.
Update communication (2 minutes)
Update your status page to "resolved." Post a brief summary. Thank users for patience.
Communicating During Downtime
How you communicate during an outage matters as much as how fast you fix it.
The Rules
- Communicate fast: Don't wait until you know the cause. "We're aware of an issue and investigating" is better than silence.
- Be honest: Don't minimize or hide. Users can tell when you're being evasive.
- Update regularly: Every 15-30 minutes during an active incident, even if there's no new information.
- Use the right channel: Status page is primary. Social media for reach. Email for major incidents.
- Close the loop: Post a resolution update and, for major incidents, a post-mortem.
Full guide: How to communicate during incidents.
Get your status page ready: Status page best practices.
Preventing Downtime
You can't eliminate downtime completely, but you can dramatically reduce its frequency and duration.
Monitor Everything That Matters
You can't prevent what you can't see. Comprehensive monitoring catches problems before they become outages: response time degradation, resource usage trends, certificate expiry dates.
Complete monitoring checklist →Multi-Region Monitoring
Single-point monitoring misses regional failures. Check from multiple locations to catch CDN issues, DNS problems, and routing failures.
Multi-region monitoring →Monitor SSL and Domains
Certificate and domain expiry cause entirely preventable downtime. Set alerts weeks in advance.
Monitor Background Jobs
Cron jobs and background tasks fail silently. Heartbeat monitoring catches these before they cause visible problems.
Heartbeat monitoring →Have a Status Page Ready
Create your status page before you need it. During an incident is the worst time to set one up.
Status pages →After the Incident: Learning and Improving
The most valuable time is after an incident, when the details are fresh. Every outage is an opportunity to get more reliable.
Run a Post-Mortem
Within 24-48 hours of a significant incident, document:
- • What happened (timeline)
- • Why it happened (root cause)
- • How it was detected
- • How it was resolved
- • What you'll do to prevent it recurring
Implement Action Items
A post-mortem without action items is just documentation. Assign specific tasks with deadlines: add a new monitor, fix the deployment process, update the runbook.
Track Improvement Over Time
Measure your uptime percentage, mean time to detect (MTTD), and mean time to recovery (MTTR). These should improve as you learn from incidents.
Frequently Asked Questions
What is an acceptable amount of downtime?
Industry standard for most businesses is 99.9% uptime (about 43 minutes of downtime per month). Critical systems target 99.99% (about 4 minutes/month). 100% uptime is practically impossible. What matters is minimizing downtime duration and communicating well when it happens.
How do I know if my website is down?
The fastest way is automated uptime monitoring that checks your site every 1-5 minutes and alerts you immediately. Without monitoring, you'll typically find out from customer complaints—hours after the outage started. Set up monitoring to detect issues within minutes.
What causes the most website downtime?
The three most common causes are: 1) Software bugs and bad deployments, 2) Infrastructure and hosting issues, and 3) Human error (configuration mistakes, expired certificates, forgotten renewals). Third-party service failures and DDoS attacks are also significant causes.
Does downtime affect SEO?
Brief downtime (minutes) has minimal SEO impact. Extended downtime (hours to days) can cause Google to temporarily de-index pages, drop rankings, and stop crawling. Recovery after extended downtime can take days or weeks. Fast detection and resolution minimizes SEO damage.
How can I prevent website downtime?
You can't prevent all downtime, but you can reduce it: use monitoring to detect issues fast, implement redundancy (multiple servers, CDN), use zero-downtime deployments, monitor SSL and domain expiry, and have an incident response plan ready. Post-mortems after incidents help prevent recurrence.
Downtime Is Inevitable. Disasters Aren't.
Every website experiences downtime. What separates well-run sites from disaster stories is preparation: monitoring that catches issues fast, communication that maintains trust, and processes that prevent recurrence.
Start with monitoring. Add a status page. Build a response plan. Learn from every incident. Your uptime will improve steadily.