Description
We are seeking a highly skilled Incident Manager to lead Major Incident Management (MIM) and ensure rapid restoration of services during critical outages. This role is responsible for minimizing business impact, driving structured incident response, and continuously improving service reliability.
The Incident Manager will act as the central point of coordination during high-severity incidents, working across engineering, operations, and business teams. This role also contributes to problem management, change coordination, and operational excellence initiatives, with a primary focus on incident leadership and service recovery.
Responsibilities
Key Responsibilities:
Incident Management (Primary Focus)
- Lead and coordinate Major Incident response (SEV1/SEV2), ensuring rapid service restoration and minimal business disruption
- Act as Incident Commander during critical incidents, driving real-time decision-making and resolution efforts
- Facilitate incident bridge calls, ensuring clear roles, timelines, and accountability
- Establish and enforce incident management processes, including severity classification, escalation paths, and response protocols
- Provide timely and structured communication to stakeholders, including executive leadership, during major incidents
- Ensure accurate documentation of incidents, including timelines, actions taken, and resolution outcomes
Post-Incident & Problem Management
- Facilitate blameless post-incident reviews (PIRs) and root cause analysis (RCA)
- Identify systemic issues and drive corrective and preventive actions to closure
- Maintain a knowledge base of known issues, workarounds, and resolutions
- Analyze incident trends to proactively reduce recurrence and improve system reliability
Change & Release Coordination
- Partner with change management teams to assess risk and operational impact of planned changes
- Support major releases and production changes, ensuring readiness and rollback planning
- Conduct change advisory board (CAB) meetings for change risk review and approval.
Monitoring & Operational Excellence
- Collaborate with engineering teams to improve monitoring, alerting, and observability
- Ensure alerts are actionable, reduce noise, and align with business impact
- Drive continuous improvement of incident response processes, tooling, and automation
- Promote best practices for system reliability, fault tolerance, and disaster recovery
Metrics & Reporting
- Track and report on key performance metrics such as MTTR (Mean Time to Resolution), MTTA (Mean Time to Acknowledge), and incident recurrence rates
- Ensure adherence to SLAs/SLOs and identify opportunities for improvement
- Provide regular reporting and insights to leadership on incident trends and system health
Collaboration & Leadership
- Act as a subject matter expert (SME) for Incident Management practices
- Mentor teams on incident response best practices and operational readiness
- Coordinate across cross-functional teams, including engineering, infrastructure, security, and vendors
On-Call Responsibilities
- Participate in a 24/7 on-call rotation as an escalation Incident Manager for critical incidents
Qualifications
Qualifications:
Required
- Bachelor’s degree in computer science, Information Technology, or a related field
- Proven experience in Incident Management or a similar role in a production environment
- Strong experience leading Major Incident Management (MIM) processes
- Solid understanding of ITIL frameworks (Incident, Problem, Change Management)
- Knowledge with cloud platforms, preferably AWS
- Experience with distributed systems, microservices architecture, and modern application stacks
- Good understanding of monitoring and observability tools (e.g., CloudWatch, Dynatrace, Splunk, Nagios)
- Familiarity with incident management tools (e.g., Jira, ServiceNow, PagerDuty)
- Excellent communication skills with the ability to engage both technical teams and executive stakeholders
- Strong analytical and problem-solving skills in high-pressure environments
Preferred
#LI-MB1
#LI-Hybrid
hackajob is partnering with Verisk Analytics to fill this position. Create a profile to be automatically considered for this role—and others that match your experience.