Sourcing as a channel, not a feature.

Manager of Site Reliability Engineer

Hyderabad, Telangana, IND
Operations Engineer Operations Manager Site Reliability Engineer DevOps Engineer Engineering Manager DevOps Leader
Actively hiring

Manager of Site Reliability Engineer

JPMorganChase
Hyderabad, Telangana, IND
Operations Engineer Operations Manager Site Reliability Engineer DevOps Engineer Engineering Manager DevOps Leader
JPMorganChase
Actively hiring

hackajob is partnering with JPMorganChase to fill this position. Create a profile to be automatically considered for this role—and others that match your experience.

 
JOB DESCRIPTION

Guide and shape the future of technology at a globally recognized firm, driven by pride in ownership.

As a SRE Manager at JPMorgan Chase within the Consumer & Community Banking, you are the non-functional requirement owner and champion for the applications in your remit. You are a key influencer in your team’s strategic planning, driving continual improvement in customer experience, resiliency, security, scalability, monitoring, instrumentation, and automation of the software in your area. You act in a blameless, data-driven manner and navigate difficult situations with composure and tact.


Job Responsibilities:

  • Define and enforce quality gates across requirements, design, secure coding, testing, release, and post-production monitoring, translate business objectives into clear, testable requirements that include reliability, availability, performance, security, and observability.
  • Establish and manage SLOs/SLIs and error budgets; ensure they are integrated into product roadmaps and delivery plans, challenge Product Owners and teams to meet a rigorous, objective Definition of Done before release.
  • Sample DoD checklist: SLOs defined and monitored; alerts tuned; runbooks and escalation paths in place; automated tests (unit, integration, security) passing; performance and capacity validated; resilience and failover tested; rollback verified; vulnerability findings remediated; compliance controls and audit artifacts complete; documentation and support readiness confirmed.
  • Lead operational readiness reviews and triage risks; ensure timely remediation and prevention of recurrence through root-cause analysis and auto-remediation.
  • Maintain logging, alerting, and monitoring platforms; ensure dashboards provide health and performance visibility. Govern CI/CD pipeline controls for security, reliability, and change management; promote automation to eliminate toil.
  • Lead and participate in critical incident response (including outside business hours when needed); drive post-incident reviews and resilience improvements. Monitor delivery health and operational KPIs; lead continuous improvement across teams and products
  • Oversee capacity planning and resilience management for large-scale, distributed systems, Partner with engineering on public cloud best practices (AWS or equivalent) for compute, storage, networking, messaging, automation (CloudFormation, Terraform), and data services.
  • Build a culture of collaboration, reliability, and continuous improvement; coach teams to adopt DevOps and SRE principles. Partner with regional engineering leaders to drive operational best practices and consistent execution. Provide concise, outcome-focused updates to management and stakeholders; influence decisions across Product, Engineering, SRE, and Security.

Required Qualifications, Capabilities, and Skills

  • Formal training or certification with 5+ years supporting critical finance-focused applications in large-scale environments and managing and mentoring teams.
  • Solid understanding of  AI-assisted solutions to accelerate root cause analysis and reduce overall TTX with appropriate validation and human judgment  
  • Experience with monitoring/logging tools (e.g., Splunk, AppDynamics) and dashboard technologies; 
  • Strong grasp of SDLC, secure development, DevOps/CI/CD tooling; capable of implementing top-tier continuous improvement with root-cause analysis and auto-remediation.
  • Effective under pressure; accountable, with excellent stakeholder management and communication skills.
  • This position may require HSA system access. Enhanced screening (criminal and credit background checks, and/or other screening) is required prior to employment and annually thereafter.
  • Global team collaboration with flexibility to engage during critical incidents outside standard business hours
  • Experience implementing and managing SLOs/SLIs, error budgets, and operational readiness reviews for distributed systems, including leading post-incident analysis and resilience improvements.
  • Deep expertise in public cloud platforms (AWS or equivalent), infrastructure automation tools (CloudFormation, Terraform), and capacity planning for large-scale environments, with a track record of driving DevOps and SRE adoption across teams.

Preferred Qualifications

  • Splunk Administrator certification desired.
ABOUT US

hackajob is partnering with JPMorganChase to fill this position. Create a profile to be automatically considered for this role—and others that match your experience.

 

Upskill

Level up the hackajob way. Verify your skills, learn brand new ones and test your ability with Pathways, our learning and development platform.

Ready to reach your potential?