Save time and effort sourcing top tech talent

Incident Response Manager

Centennial, Colorado, USA
Up to $90,000/ year
Any
Actively hiring

Incident Response Manager

Comcast
Centennial, Colorado, USA
Up to $90,000/ year
Any
Comcast
Actively hiring

hackajob is partnering with Comcast to fill this position. Create a profile to be automatically considered for this role—and others that match your experience.

 

COMCAST Technology Solutions has a great opportunity for an Incident Response Manager with a history of leading major incident responses effectively and within an established structure to join our Global Operations team. You will work with operations, professional services and product engineering to respond to incidents, help them learn from them, and continuously improve CVP. 

 

 

Our Incident Response Manager (aka Incident Commander) leads and directs the team during major incidents related to the Cloud Video Platform and serves as the problem manager for post-incident review and ongoing management. You will prioritize incidents and manage resources during major incidents to ensure prompt and efficient resolution. 

This Engineer 2, Engineering Operations role is an Individual Contributor - you will lead teams through incidents, but do not have direct reports. The role is in-office on a hybrid basis, typical business hours during the week and part of an on-call rotation (generally one weekend per 5-6 weeks, with on-call pay). 

 

About the team 
We pride ourselves on being a global, diverse, and innovative team with Operations hubs in the United States, London, and Chennai. We foster an environment where employees are empowered to bring forward new ideas. Different perspectives are valued as this drives efficient and effective solutions which benefit the client, the business, and employees. Our follow-the-sun model provides an established workflow across time zones and promotes both work/life balance as well as business continuity. With supportive leadership and a collaborative team culture, we offer meaningful and challenging work that truly makes an impact. We’re proud to support major global streaming events like Formula 1, the UEFA Euros, the Olympics and more. 

 

RESPONSIBILITIES 

  • Lead service incident investigations to resolution in order to ensure service availability of CTS' product offerings 24x7. 

  • Ensure all key resources are engaged and focused on remediation 

  • Follow our incident management framework in a consistent manner 

  • Work with a team of incident managers in a follow-the-sun rotation to partner closely with on-call service engineers and engineering leaders around the world. 

  • Drive the incident communications strategy and execution across CTS 

  • Drive and rely on key performance measures, which will be critical to validating performance and service health. These metrics include: 

  • Communications speed 

  • Communications quality 

  • Communications accuracy 

  • Stakeholder satisfaction with the communications provided 

  • You can drive programs to improve alert coverage and accuracy, drive down resolution times, work with owners to complete post mortems on a timely basis, and track the aging of repair items, as well as define other incident management- and problem management-related workflows. 

  • Partner closely with Product engineers across CTS to promote a culture of operational excellence including hosting regular training as well as reviewing KPIs in rhythm-of-business meetings. 

  • Contribute to using and extending the CTS framework to build an incident management toolset. 

  • Be a proponent to drive and guide your co-workers to embrace the value of the industry-standard approach of managing incidents 

  • Other duties and responsibilities as assigned 

 

ABOUT YOU 

Our people are the most important part of our business. We are fundamentally looking for forward-thinking, enthusiastic problem solvers. People who love a challenge, constantly evaluate and question, and above all, love to ship a product that solves real problems. While these characteristics outweigh any specific technical skills, you should be able to demonstrate some of the below: 

 

Must Have Skills:

  • 4+ years of experience managing technical incidents (such as outages) and running incident management programs, preferably in large-scale global environments. 

  • Experience in a high availability SaaS environment  

  • 3+ years of experience working with services running in public cloud platforms such as AWS, Azure, or Google Cloud. 

  • Understanding of web services (HTTP, Web API, Web protocols). 

  • Familiarity with System observability (Datadog or Splunk) 

  • Previous experience in ICS (Incident Command Systems) framework and familiarity with ServiceNow 

  • You can be a calming voice in a storm that people listen to. You rush towards fires, not away from them. 

  • Exceptional verbal and written communication skills. 

  • ITIL certification will be highly valued. 

hackajob is partnering with Comcast to fill this position. Create a profile to be automatically considered for this role—and others that match your experience.

 

Upskill

Level up the hackajob way. Verify your skills, learn brand new ones and test your ability with Pathways, our learning and development platform.

Ready to reach your potential?