Sourcing as a channel, not a feature.

Incident Manager

Jersey City, NJ, United States
Up to $130,000/ year
Operations Manager Site Reliability Engineer IT Service Desk Manager Production Analyst Operations Engineer
Actively hiring

Incident Manager

Verisk Analytics
Jersey City, NJ, United States
Up to $130,000/ year
Operations Manager Site Reliability Engineer IT Service Desk Manager Production Analyst Operations Engineer
Verisk Analytics
Actively hiring

hackajob is partnering with Verisk Analytics to fill this position. Create a profile to be automatically considered for this role—and others that match your experience.

 

Description

We are seeking a highly skilled Incident Manager to lead Major Incident Management (MIM) and ensure rapid restoration of services during critical outages. This role is responsible for minimizing business impact, driving structured incident response, and continuously improving service reliability.

The Incident Manager will act as the central point of coordination during high-severity incidents, working across engineering, operations, and business teams. This role also contributes to problem management, change coordination, and operational excellence initiatives, with a primary focus on incident leadership and service recovery.

Responsibilities

Key Responsibilities:

Incident Management (Primary Focus)

  • Lead and coordinate Major Incident response (SEV1/SEV2), ensuring rapid service restoration and minimal business disruption
  • Act as Incident Commander during critical incidents, driving real-time decision-making and resolution efforts
  • Facilitate incident bridge calls, ensuring clear roles, timelines, and accountability
  • Establish and enforce incident management processes, including severity classification, escalation paths, and response protocols
  • Provide timely and structured communication to stakeholders, including executive leadership, during major incidents
  • Ensure accurate documentation of incidents, including timelines, actions taken, and resolution outcomes

Post-Incident & Problem Management

  • Facilitate blameless post-incident reviews (PIRs) and root cause analysis (RCA)
  • Identify systemic issues and drive corrective and preventive actions to closure
  • Maintain a knowledge base of known issues, workarounds, and resolutions
  • Analyze incident trends to proactively reduce recurrence and improve system reliability

Change & Release Coordination

  • Partner with change management teams to assess risk and operational impact of planned changes
  • Support major releases and production changes, ensuring readiness and rollback planning
  • Conduct change advisory board (CAB) meetings for change risk review and approval. 

Monitoring & Operational Excellence

  • Collaborate with engineering teams to improve monitoring, alerting, and observability
  • Ensure alerts are actionable, reduce noise, and align with business impact
  • Drive continuous improvement of incident response processes, tooling, and automation
  • Promote best practices for system reliability, fault tolerance, and disaster recovery

Metrics & Reporting

  • Track and report on key performance metrics such as MTTR (Mean Time to Resolution), MTTA (Mean Time to Acknowledge), and incident recurrence rates
  • Ensure adherence to SLAs/SLOs and identify opportunities for improvement
  • Provide regular reporting and insights to leadership on incident trends and system health

Collaboration & Leadership

  • Act as a subject matter expert (SME) for Incident Management practices
  • Mentor teams on incident response best practices and operational readiness
  • Coordinate across cross-functional teams, including engineering, infrastructure, security, and vendors

On-Call Responsibilities

  • Participate in a 24/7 on-call rotation as an escalation Incident Manager for critical incidents

Qualifications

Qualifications:

Required

  • Bachelor’s degree in computer science, Information Technology, or a related field
  • Proven experience in Incident Management or a similar role in a production environment
  • Strong experience leading Major Incident Management (MIM) processes
  • Solid understanding of ITIL frameworks (Incident, Problem, Change Management)
  • Knowledge with cloud platforms, preferably AWS
  • Experience with distributed systems, microservices architecture, and modern application stacks
  • Good understanding of monitoring and observability tools (e.g., CloudWatch, Dynatrace, Splunk, Nagios)
  • Familiarity with incident management tools (e.g., Jira, ServiceNow, PagerDuty)
  • Excellent communication skills with the ability to engage both technical teams and executive stakeholders
  • Strong analytical and problem-solving skills in high-pressure environments

Preferred

  • ITIL certification (Foundation or higher)
  • AWS certification (e.g., Cloud Practitioner or Associate level)
  • Experience with CI/CD pipelines and DevOps practices
  • Experience leveraging automation or AI tools to enhance incident response and analysis

  • Understanding of networking, storage, and infrastructure concepts

#LI-MB1

#LI-Hybrid

hackajob is partnering with Verisk Analytics to fill this position. Create a profile to be automatically considered for this role—and others that match your experience.

 

Upskill

Level up the hackajob way. Verify your skills, learn brand new ones and test your ability with Pathways, our learning and development platform.

Ready to reach your potential?