Incident Response Communication

Channel selection, escalation paths, severity definitions, and post-incident checklists.


headers-purple-computer


When something breaks, clear communication is just as important as the technical fix. This chapter covers how to call for help, structure incident messages, manage escalation, and close the loop after resolution.

Calling for Help

Channel Selection Guidelines

Where you post depends on the scope of the incident:

  1. Private Group (for initial triage)
    • Single environment impact
    • Minor component affected
    • Limited scope (fewer than 5 people involved)
    • Uncertain team ownership
  2. Public Channel (mandatory for)
    • Multi-environment impact
    • Critical component affected
    • Multiple teams involved
    • Potential widespread impact
    • Escalated incidents

Team Engagement Protocol

Development Team Engagement

Infrastructure Team Engagement

Cross-team Collaboration

Message Templates

Initial Alert

🚨 Incident Alert
• Issue: [Brief description]
• Impact: [Affected services/users]
• Environment: [Dev/Staging/Prod]
• Current status: [Investigating/Identified/In Progress]
• Severity: [P1/P2/P3/P4]
• Started at: [Time]
CC: @oncall @teamlead

Assistance Request

Need assistance with:
• Component: [System/Service name]
• Problem: [Specific issue]
• Attempted: [Actions taken so far]
• Logs: [Link to logs]
• Access needed: [Yes/No]
Priority: [Urgent/High/Medium/Low]

Incident Management Process

1. Detection and Triage

2. Communication Flow

Severity Update Frequency
P1 Every 30 minutes
P2 Every 60 minutes
P3 Every 2 hours
P4 Final update post-resolution

3. Escalation Path

L1 Support (15 min) → L2 Engineer (30 min) → Team Lead (45 min) → Department Head (60 min)

Each transition should include a handoff message summarising what has been tried so far.

Incident Severity Definitions

Severity Definition Example
P1 Service down, customer impact Production API returning 500s for all users
P2 Degraded service, workaround exists Slow response times, users can retry
P3 Minor impact, non-critical A single internal tool is unavailable
P4 Minimal impact, can be scheduled A cosmetic bug in the admin panel

Communication Channels Matrix

Severity Primary Channel Secondary Channel Update Frequency
P1 #incidents Team Slack/Teams 30 min
P2 #team-channel Email 60 min
P3 #team-channel : 120 min
P4 Squad channel : Daily

Post-Incident Actions

Immediate Actions

Documentation Requirements

Follow-up Tasks

Best Practices

Do:

Don’t:

Chapter 5 of 5