Save time and effort sourcing top tech talent

Incident Management Engineer or Reliability Engineer or DevOps Engineer

CHICAGO, IL, US
Up to $400,000/ year
DevOps Engineer Operations Engineer Site Reliability Engineer Application Support Engineer
Actively hiring

Incident Management Engineer or Reliability Engineer or DevOps Engineer

Comcast
CHICAGO, IL, US
Up to $400,000/ year
DevOps Engineer Operations Engineer Site Reliability Engineer Application Support Engineer
Comcast
Actively hiring

hackajob is partnering with Comcast to fill this position. Create a profile to be automatically considered for this role—and others that match your experience.

 

•    Job Posting Title

Sr. Software Engineer - Incident Management - Chicago, IL OR Denver, CO - Onsite
•    Job Description Summary
Job Description Summary
We’re looking for a Sr. Software Engineer with Incident Management experience
to be the central point of accountability for Incident Management in Software Engineering. This role is special because it combines deep technical expertise with strong collaboration and communication skills, ensuring we not only resolve incidents quickly but also turn them into long-term improvements.

You’ll split your time between technical ownership – leading root cause analysis, retrospectives, and system hardening – and cross-functional collaboration – working with Engineering teams on improvement plans and with the COO/client-facing teams on impact analysis and clear communications.

This role is key to building a resilient, reliable, and learning-focused culture where every incident strengthens our systems, our processes, and our customer trust. As our customer base grows globally, you’ll also help us ensure consistent, high-quality service across time zones and regions.

This role is about creating consistency, building trust, and making sure escalations become opportunities to improve – not just problems to patch.
•    Job Description
Job Description
Technical Ownership (50%)
o    Own the Escalations lifecycle within Engineering, from the beginning through resolution.
o    Lead root cause analysis (RCA) sessions that dig deeper than symptoms and deliver long-lasting fixes.
o    Facilitate retrospectives and follow-ups, turning lessons learned into clear improvement plans.
o    Define and track metrics (incident frequency, resolution times, client impact), and make them visible through dashboards and reports.
o    Partner with teams to strengthen systems through tooling, automation, and platform hardening.
o    Keep a cross-platform perspective (TV, Data, Beeswax, Strata) to spot patterns and systemic issues.

Collaboration & Communication (50%)
o    Lead Incident Management reviews and improvement sessions with leadership, highlighting what happened, why, and how we’ll prevent it next time.
o    Support a culture of learning and transparency by running training, knowledge-sharing, and quality workshops.
o    Act as the single voice for Engineering in incident management, making sure communication is consistent and clear at all levels.
o    Collaborate with Engineering (Tier 2/3) to resolve incidents quickly and share learnings across teams.
o    Partner with Operations (Tier 1) to fine-tune escalation paths and help reduce unnecessary hand-offs.
o    Work closely with the COO team to analyze client impact and provide crisp, timely updates during incidents.

Requirements
o    6+ years of technical experience in software engineering, site reliability, or production operations.
o    Proven track record of managing the full software development lifecycle (SDLC), from requirements gathering to production release. Hands-on understanding of full-stack components:

Hands on understanding of full stack components:
1. Frontend/UI frameworks and client experience
2. APIs & service layers
3. Database layer (SQL/NoSQL, data modeling, performance tuning)
4. Backend servers and distributed systems
5. Big data & ETL pipelines (batch and streaming)
o    Strong knowledge of incident management (PagerDuty, Jira, Datadog, Splunk, ServiceNow).
o    Confidence to dive deep with engineers while also translating technical details into clear business context for executives and clients.
o    Experience operating in global, multi-time-zone environments with diverse customer and platform needs.

Employees at all levels are expected to:
o    Understand our Operating Principles; make them the guidelines for how you do your job.
o    Own the customer experience - think and act in ways that put our customers first, give them seamless digital options at every touchpoint, and make them promoters of our products and services.
o    Know your stuff - be enthusiastic learners, users and advocates of our game-changing technology, products and services, especially our digital tools and experiences.
o    Win as a team - make big things happen by working together and being open to new ideas.
o    Be an active part of the Net Promoter System - a way of working that brings more employee and customer feedback into the company - by joining huddles, making call backs and helping us elevate opportunities to do better for our customers.
o    Drive results and growth.
o    Support a culture of inclusion in how you work and lead.
o    Do what's right for each other, our customers, investors and our communities.

Disclaimer:
o    This information has been designed to indicate the general nature and level of work performed by employees in this role. It is not designed to contain or be interpreted as a comprehensive inventory of all duties, responsibilities and qualifications.

 

hackajob is partnering with Comcast to fill this position. Create a profile to be automatically considered for this role—and others that match your experience.

 

Upskill

Level up the hackajob way. Verify your skills, learn brand new ones and test your ability with Pathways, our learning and development platform.

Ready to reach your potential?