JOB DESCRIPTIONGuide and shape the future of technology at a globally recognized firm, driven by pride in ownership.
As a SRE Manager at JPMorgan Chase within the Consumer & Community Banking, you are the non-functional requirement owner and champion for the applications in your remit. You are a key influencer in your teamâs strategic planning, driving continual improvement in customer experience, resiliency, security, scalability, monitoring, instrumentation, and automation of the software in your area. You act in a blameless, data-driven manner and navigate difficult situations with composure and tact.
Job Responsibilities:
- Define and enforce quality gates across requirements, design, secure coding, testing, release, and post-production monitoring, translate business objectives into clear, testable requirements that include reliability, availability, performance, security, and observability.
- Establish and manage SLOs/SLIs and error budgets; ensure they are integrated into product roadmaps and delivery plans, challenge Product Owners and teams to meet a rigorous, objective Definition of Done before release.
- Sample DoD checklist: SLOs defined and monitored; alerts tuned; runbooks and escalation paths in place; automated tests (unit, integration, security) passing; performance and capacity validated; resilience and failover tested; rollback verified; vulnerability findings remediated; compliance controls and audit artifacts complete; documentation and support readiness confirmed.
- Lead operational readiness reviews and triage risks; ensure timely remediation and prevention of recurrence through root-cause analysis and auto-remediation.
- Maintain logging, alerting, and monitoring platforms; ensure dashboards provide health and performance visibility. Govern CI/CD pipeline controls for security, reliability, and change management; promote automation to eliminate toil.
- Lead and participate in critical incident response (including outside business hours when needed); drive post-incident reviews and resilience improvements. Monitor delivery health and operational KPIs; lead continuous improvement across teams and products
- Oversee capacity planning and resilience management for large-scale, distributed systems, Partner with engineering on public cloud best practices (AWS or equivalent) for compute, storage, networking, messaging, automation (CloudFormation, Terraform), and data services.
- Build a culture of collaboration, reliability, and continuous improvement; coach teams to adopt DevOps and SRE principles. Partner with regional engineering leaders to drive operational best practices and consistent execution. Provide concise, outcome-focused updates to management and stakeholders; influence decisions across Product, Engineering, SRE, and Security.
Required Qualifications, Capabilities, and Skills
- Formal training or certification with 5+ years supporting critical finance-focused applications in large-scale environments and managing and mentoring teams.
- Solid understanding of AI-assisted solutions to accelerate root cause analysis and reduce overall TTX with appropriate validation and human judgment
- Experience with monitoring/logging tools (e.g., Splunk, AppDynamics) and dashboard technologies;
- Strong grasp of SDLC, secure development, DevOps/CI/CD tooling; capable of implementing top-tier continuous improvement with root-cause analysis and auto-remediation.
- Effective under pressure; accountable, with excellent stakeholder management and communication skills.
- This position may require HSA system access. Enhanced screening (criminal and credit background checks, and/or other screening) is required prior to employment and annually thereafter.
- Global team collaboration with flexibility to engage during critical incidents outside standard business hours
- Experience implementing and managing SLOs/SLIs, error budgets, and operational readiness reviews for distributed systems, including leading post-incident analysis and resilience improvements.
- Deep expertise in public cloud platforms (AWS or equivalent), infrastructure automation tools (CloudFormation, Terraform), and capacity planning for large-scale environments, with a track record of driving DevOps and SRE adoption across teams.
Preferred Qualifications
- Splunk Administrator certification desired.
ABOUT US
hackajob is partnering with JPMorganChase to fill this position. Create a profile to be automatically considered for this role—and others that match your experience.