Channel selection, escalation paths, severity definitions, and post-incident checklists.

When something breaks, clear communication is just as important as the technical fix. This chapter covers how to call for help, structure incident messages, manage escalation, and close the loop after resolution.
Where you post depends on the scope of the incident:
🚨 Incident Alert
• Issue: [Brief description]
• Impact: [Affected services/users]
• Environment: [Dev/Staging/Prod]
• Current status: [Investigating/Identified/In Progress]
• Severity: [P1/P2/P3/P4]
• Started at: [Time]
CC: @oncall @teamlead
Need assistance with:
• Component: [System/Service name]
• Problem: [Specific issue]
• Attempted: [Actions taken so far]
• Logs: [Link to logs]
• Access needed: [Yes/No]
Priority: [Urgent/High/Medium/Low]
| Severity | Update Frequency |
|---|---|
| P1 | Every 30 minutes |
| P2 | Every 60 minutes |
| P3 | Every 2 hours |
| P4 | Final update post-resolution |
L1 Support (15 min) → L2 Engineer (30 min) → Team Lead (45 min) → Department Head (60 min)
Each transition should include a handoff message summarising what has been tried so far.
| Severity | Definition | Example |
|---|---|---|
| P1 | Service down, customer impact | Production API returning 500s for all users |
| P2 | Degraded service, workaround exists | Slow response times, users can retry |
| P3 | Minor impact, non-critical | A single internal tool is unavailable |
| P4 | Minimal impact, can be scheduled | A cosmetic bug in the admin panel |
| Severity | Primary Channel | Secondary Channel | Update Frequency |
|---|---|---|---|
| P1 | #incidents |
Team Slack/Teams | 30 min |
| P2 | #team-channel |
60 min | |
| P3 | #team-channel |
: | 120 min |
| P4 | Squad channel | : | Daily |
Do:
Don’t: