hackajob is partnering with American Express to fill this position. Create a profile to be automatically considered for this role—and others that match your experience.
At American Express, our culture is built on a 175-year history of innovation, shared values and Leadership Behaviors, and an unwavering commitment to back our customers, communities, and colleagues. From delivering differentiated products to providing world-class customer service, we operate with a strong risk mindset, ensuring we continue to uphold our brand promise of trust, security, and service.
Here, your voice and ideas matter, your work makes an impact, and together, you will help us define the future of American Express.
As part of Team Amex, you'll experience this powerful backing with comprehensive support for your holistic well-being and many opportunities to learn new skills, develop as a leader, and grow your career. Here, your voice and ideas matter, your work makes an impact, and together, you will help us define the future of American Express.
How will you make an impact in this role?
• Mentors junior Site Reliability Engineers and cross-functional team of colleagues, fostering a culture of excellence and innovation
• Provides guidance and support to junior engineers, fostering professional growth and development within the team, ensuring adherence to best practices in Site Reliability Engineering
• Manages and oversees collaboration with Software Engineering teams to design, develop, and implement advanced features that enhance system resilience, scalability, and performance, proactively identifying and resolving complex system bottlenecks and failure points
• Leads the development and refinement of sophisticated automation tools and frameworks, including advanced infrastructure as code (IaC) practices, to streamline complex operational workflows, deployment processes, and infrastructure management, significantly reducing manual intervention and ensuring high system efficiency
• Actively engages in and influences high-level architectural design discussions, ensuring that advanced reliability, scalability, and performance considerations are deeply integrated into strategic decision-making processes, and driving the adoption of innovative solutions
• Designs, executes, and oversees comprehensive chaos engineering experiments and advanced resiliency testing, analyzing results to implement robust improvements that enhances system robustness and recovery capabilities, and mentors colleagues in these practices
• Leads the development, optimization, and maintenance of comprehensive disaster recovery plans and business continuity strategies, ensuring systems can recover quickly and effectively from complex and unexpected disruptions
• Advocates for and implements advanced observability practices, including error budgeting, service-level objectives (SLOs), and service-level indicators (SLIs), contributing to a culture of continuous improvement and reliability, and mentoring colleagues in these practices
• Collaborates with cross-functional teams to enhance customer journeys, ensuring seamless and reliable technology experiences by addressing potential reliability and performance issues proactively, and leading initiatives to improve overall system reliability
• Collaborates and co-creates effectively with teams in product and the business to align technology initiatives with business objectives
Minimum Qualifications:
• Bachelor's degree in Computer Science, Information Technology, Engineering, and/or comparable experience; advance degree preferred
• 3 years experience of modern observability stack - Splunk, Elastic Search, Prometheus, Grafana
• 3 years experience of containerization technologies (e.g., Kubernetes, Docker) and microservices architecture
• 3 years experience in container orchestration tools (Kubernetes, ECS, Docker Swarm)
• 3 years experience and knowledge of observability tools and methodologies, including experience with logging, monitoring, tracing, and performance analysis platforms
• 1 year experience of cloud-based Site Reliability Engineering (SRE) practices and experience with public cloud platforms such as AWS, Azure, or Google Cloud
• Expert level knowledge of service based and event driven systems and infrastructure (Streams, Topics, Queues, REST)
• Expert level knowledge of IaC automation tools (Terraform, Ansible, CloudFormation, Puppet, Chef)
• Expert level knowledge of CI/CD Automation tools (GitHub Actions, AWS CodePipeline, Google Cloud Build)
• Expert level knowledge of web architecture including networking, infrastructure configuration and provisioning, infrastructure scaling,
Preferred Qualifications:
• AWS Certified DevOps Engineer - Professional
• Google Cloud Professional Cloud DevOps Engineer Certification
Salary Range: $123,000.00 to $215,250.00 annually + bonus + benefits
The above represents the expected salary range for this job requisition. Ultimately, in determining your pay, we’ll consider your location, experience, and other job-related factors.
We back you with benefits that support your holistic well-being so you can be and deliver your best. This means caring for you and your loved ones' physical, financial, and mental health, as well as providing the flexibility you need to thrive personally and professionally:
For a full list of Team Amex benefits, visit our Colleague Benefits Site.
American Express is an equal opportunity employer and makes employment decisions without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran status, disability status, age, or any other status protected by law. American Express will consider for employment all qualified applicants, including those with arrest or conviction records, in accordance with the requirements of applicable state and local laws, including, but not limited to, the California Fair Chance Act, the Los Angeles County Fair Chance Ordinance for Employers, and the City of Los Angeles’ Fair Chance Initiative for Hiring Ordinance. For positions covered by federal and/or state banking regulations, American Express will comply with such regulations as it relates to the consideration of applicants with criminal convictions.
We back our colleagues with the support they need to thrive, professionally and personally. That's why we have Amex Flex, our enterprise working model that provides greater flexibility to colleagues while ensuring we preserve the important aspects of our unique in-person culture. Depending on role and business needs, colleagues will either work onsite, in a hybrid model (combination of in-office and virtual days) or fully virtually.
US Job Seekers - Click to view the “Know Your Rights” poster. If the link does not work, you may access the poster by copying and pasting the following URL in a new browser window: https://www.eeoc.gov/poster
Employment eligibility to work with American Express in the United States is required as the company will not pursue visa sponsorship for these positions.
hackajob is partnering with American Express to fill this position. Create a profile to be automatically considered for this role—and others that match your experience.
Level up the hackajob way. Verify your skills, learn brand new ones and test your ability with Pathways, our learning and development platform.