You set up monitoring. You connected your email. You get 47 alerts on Monday morning. By Wednesday, you've stopped reading them. On Friday, your checkout goes down for 3 hours and you miss the alert completely.
This is alert fatigue, and it kills monitoring effectiveness. The solution isn't fewer alerts—it's smarter alerts. The right notification, to the right person, on the right channel, at the right severity.
Setting up monitoring from scratch? Start with our monitoring checklist.
Alert Channels: Choosing the Right One
Different channels serve different purposes. The goal is matching urgency to intrusiveness.
| Channel | Speed | Intrusiveness | Best For |
|---|---|---|---|
| Minutes | Low | Non-urgent alerts, reports, summaries | |
| Slack / Discord | Seconds | Medium | Team visibility, day-to-day monitoring |
| SMS | Seconds | High | Critical issues, after-hours |
| Phone call | Immediate | Very high | True emergencies, revenue-impacting outages |
| Push notification | Seconds | Medium-High | Mobile-first teams, on-the-go awareness |
| Webhook | Seconds | None (machine) | Automated responses, custom integrations |
See available channels: PerkyDash alerting options.
The Multi-Channel Approach
Don't rely on one channel. Use a layered approach:
- Always: Slack/Discord for team visibility
- Always: Email as a backup and for records
- Critical only: SMS or phone for truly urgent issues
- Optional: Webhook for automated incident response
Designing Alert Severity Levels
Not every alert is equally important. Define clear severity levels and map them to appropriate channels:
🔴 Critical
Meaning: Site is down, revenue is being lost, users are affected
Examples:
- Homepage returning 500
- Checkout completely down
- API unreachable from all regions
- SSL certificate expired
Channels: Slack + SMS + Phone + Email
Response time: Immediately
🟡 Warning
Meaning: Something is degraded but not completely broken
Examples:
- Response time doubled
- One region failing, others OK
- SSL expiring in 14 days
- Non-critical endpoint down
Channels: Slack + Email
Response time: Within 1 hour during business hours
🔵 Info
Meaning: Something worth noting but not urgent
Examples:
- Service recovered after brief outage
- SSL renewal succeeded
- Response times returned to normal
- Scheduled maintenance reminder
Channels: Email or dashboard only
Response time: Review during next business day
Avoiding Alert Fatigue
Alert fatigue is the #1 reason monitoring fails. If you get too many alerts, you stop paying attention. When a real incident happens, you miss it.
Use Confirmation Checks
Don't alert on the first failure. Require 2-3 consecutive failures before triggering a notification. Brief network blips resolve themselves—your team doesn't need to know about every hiccup.
Bad: Alert after 1 failed check (5-minute interval) → false positive every time there's a 10-second network blip
Good: Alert after 3 consecutive failures (15 minutes of confirmed downtime) → real problems only
Use Multi-Region Validation
If you check from multiple regions, require at least 2 regions to report failure before alerting. A single-region failure might be a local network issue, not a real outage.
Why regions matter: Multi-region monitoring.
Set Reasonable Thresholds
If your server normally responds in 800ms, don't set a 1-second timeout. A brief spike to 1.2 seconds isn't an incident. Set thresholds based on your baseline + reasonable margin.
Separate Critical from Noise
Route critical alerts to SMS/phone. Route everything else to Slack or email. When your phone buzzes, you know it's real. When Slack pings, you can check when convenient.
Review and Tune Regularly
Every week or two, review your alerts:
- How many alerts did you get?
- How many were actionable?
- Did you miss any real incidents?
- What thresholds need adjusting?
Rule of thumb: If more than 20% of your alerts are false positives, your configuration needs work.
Setting Up Alerts for Teams
Don't Rely on One Person
If alerts go to one email, what happens when that person is on vacation? Sick? Asleep? Always have multiple recipients.
Shared Channels
A dedicated Slack/Discord channel (#monitoring-alerts) gives the whole team visibility. Anyone can see and respond to issues. This also creates a searchable history of incidents.
Escalation Strategy
Define a simple escalation path:
First 5 minutes: Slack notification to team channel
After 15 minutes (still down): SMS to on-call person
After 30 minutes: Phone call to team lead
After 1 hour: All hands notification
Most tools support escalation policies. If yours doesn't, at minimum ensure multiple channels are active for critical alerts.
When You're a Solo Founder
No team to escalate to? Set up:
- Slack + email for daytime awareness
- SMS for critical after-hours issues
- A status page so users know you're aware (even if you're asleep)
Create one now: Free status page generator or Emergency status page for when things go wrong fast.
What Good Alert Messages Look Like
An alert should tell you what happened, where, and how to start investigating—without overwhelming you.
Bad Alert
Alert: Check Failed
Monitor #47 is down.
Which monitor? What failed? How long? Useless at 3 AM.
Good Alert
🔴 DOWN: api.example.com/health
Status: HTTP 503 (Service Unavailable)
Duration: Down for 8 minutes
Regions affected: EU, US-East (Asia OK)
Last successful check: 10:34 UTC
Dashboard: https://perkydash.com/dashboard/...
Now you know exactly what's wrong and where to look.
Recovery Alert
✅ RECOVERED: api.example.com/health
Status: HTTP 200 OK
Total downtime: 12 minutes
Response time: 245ms (normal)
Recovery alerts close the loop. You know the issue is resolved without manually checking.
Recommended Alert Configurations
Solo Founder / Side Project
| Situation | Channel |
|---|---|
| Site down (confirmed) | Email + SMS |
| Site recovered | |
| SSL expiring (30 days) | |
| SSL expiring (7 days) | Email + SMS |
Small Team / SaaS Product
| Situation | Channel |
|---|---|
| Site down (confirmed) | Slack #alerts + SMS to on-call |
| Site slow (degraded) | Slack #alerts |
| Site recovered | Slack #alerts + Email |
| SSL/Domain expiring | Slack #ops + Email to admin |
| Cron job missed | Slack #alerts |
Agency (Multiple Clients)
| Situation | Channel |
|---|---|
| Client site down | Slack #client-name + SMS to account manager |
| Client SSL expiring | Slack #ops + Email to client (optional) |
| All recovered | Slack #client-name |
Manage client alerts separately: Client views for agencies.
Frequently Asked Questions
How many alerts per week is normal?
For a well-configured setup, 0-3 actionable alerts per week is healthy. If you're getting more than 5 per week, either your infrastructure has real issues or your alert thresholds need tuning. More than 10 per week almost certainly means false positives are drowning out real problems.
Should I get alerts for recovery too?
Yes, always. Recovery alerts close the loop—you know the issue is resolved without manually checking. Send recovery alerts to the same channels as down alerts, but at lower urgency (Slack/email, not SMS).
What's the best channel for overnight alerts?
SMS for critical issues only. Don't send Slack or email alerts overnight—they'll pile up unread. For truly critical systems, a phone call integration (like PagerDuty) ensures you wake up for emergencies. For most small businesses, SMS is sufficient.
How do I test that my alerts actually work?
Most monitoring tools have a "test notification" feature that sends a sample alert to each configured channel. Use it after setup and periodically after that. Also verify alerts reach multiple team members, not just the person who configured them.
Start With Simple, Iterate
Don't over-engineer your alert setup on day one. Start with email + one team channel for everything. As you learn what's noisy and what's critical, refine your severity levels and channel routing.
The goal is simple: when something important breaks, the right person knows within minutes. Everything else is optimization.