Save time and effort sourcing top tech talent

Site Reliability Engineer (SRE)

Knutsford, United Kingdom
Site Reliability Engineer DevOps Engineer Platform Engineer
Actively hiring

Site Reliability Engineer (SRE)

Barclays
Knutsford, United Kingdom
Site Reliability Engineer DevOps Engineer Platform Engineer
Barclays
Actively hiring

hackajob is partnering with Barclays to fill this position. Create a profile to be automatically considered for this role—and others that match your experience.

 

Step into the role of Lead Site Reliability Engineer (SRE) at Barclays, where you will be a senior technical expert responsible for driving end-to-end resilience, reliability, and scalability across our mission-critical virtual platform. This role focuses on ensuring systems are designed for fault tolerance, observability, and operational excellence.

You will perform deep technical reviews, troubleshoot complex issues, and define patterns for resiliency by design. As a hands-on engineer, you will collaborate with development and production support teams, advocate chaos engineering, and build a culture of designing for failure. This position requires strong technical breadth across infrastructure, applications, networks, databases, and integrations, combined with expertise in modern reliability engineering practices.

Key responsibilities:

  • Reliability Engineering: Drive strategies to improve reliability, maintainability, and scalability across platform components.
  • Architecture and Design Review: Conduct deep technical assessments of system architectures, identifying risks and recommending improvements for fault tolerance and disaster recovery.
  • Observability & Monitoring: Design and implement full-stack observability solutions, including metrics, logging, distributed tracing, and alerting.
  • Incident Management & Root Cause Analysis: Act as a senior escalation point for production incidents, lead RCA, and implement permanent fixes to prevent recurrence.
  • Chaos Engineering & Failure Testing: Advocate and implement chaos engineering principles to validate system resilience under real-world failure scenarios.
  • Automation & Tooling: Develop automation for failover, capacity management, and self-healing mechanisms to reduce operational risk.
  • Continuous Improvement: Analyse service risk assessments and production incidents to identify systemic issues and drive long-term improvements.

To be successful as a Lead Site Reliability Engineer (SRE), you should have experience with:

  • Technical Expertise: Proven experience building and operating fault-tolerant, highly available systems at scale.
  • Architecture & Design: Strong knowledge of distributed systems, resiliency patterns (circuit breakers, retries, failover), and disaster recovery strategies.
  • Problem-Solving: Ability to troubleshoot complex technical issues across distributed systems and perform deep root cause analysis.
  • Collaboration & Influence: Skilled at working with development, operations, and architecture teams to embed reliability into design and delivery.

Some other highly valued skills may include:

  • Understanding of cloud solutions, preferably VMWare products.
  • Exposure to coding in Python.

You may be assessed on the key critical skills relevant for success in role, such as risk and controls, change and transformation, business acumen, strategic thinking and digital and technology, as well as job-specific technical skills.

This role is based in Knutsford, with a hybrid working model of working a minimum of 2/3 days per week in the office.

Purpose of the role

To apply software engineering techniques, automation, and best practices in incident response, to ensure the reliability, availability, and scalability of the systems, platforms, and technology through them. 

Accountabilities

  • Availability, performance, and scalability of systems and services through proactive monitoring, maintenance, and capacity planning.
  • Resolution, analysis and response to system outages and disruptions, and implement measures to prevent similar incidents from recurring.
  • Development of tools and scripts to automate operational processes, reducing manual workload, increasing efficiency, and improving system resilience.
  • Monitoring and optimisation of system performance and resource usage, identify and address bottlenecks, and implement best practices for performance tuning.
  • Collaboration with development teams to integrate best practices for reliability, scalability, and performance into the software development lifecycle, and work closely with other teams to ensure smooth and efficient operations.
  • Stay informed of industry technology trends and innovations, and actively contribute to the organization's technology communities to foster a culture of technical excellence and growth.

Vice President Expectations

  • To contribute or set strategy, drive requirements and make recommendations for change. Plan resources, budgets, and policies; manage and maintain policies/ processes; deliver continuous improvements and escalate breaches of policies/procedures..
  • If managing a team, they define jobs and responsibilities, planning for the department’s future needs and operations, counselling employees on performance and contributing to employee pay decisions/changes. They may also lead a number of specialists to influence the operations of a department, in alignment with strategic as well as tactical priorities, while balancing short and long term goals and ensuring that budgets and schedules meet corporate requirements..
  • If the position has leadership responsibilities, People Leaders are expected to demonstrate a clear set of leadership behaviours to create an environment for colleagues to thrive and deliver to a consistently excellent standard. The four LEAD behaviours are: L – Listen and be authentic, E – Energise and inspire, A – Align across the enterprise, D – Develop others..
  • OR for an individual contributor, they will be a subject matter expert within own discipline and will guide technical direction. They will lead collaborative, multi-year assignments and guide team members through structured assignments, identify the need for the inclusion of other areas of specialisation to complete assignments. They will train, guide and coach less experienced specialists and provide information affecting long term profits, organisational risks and strategic decisions..
  • Advise key stakeholders, including functional leadership teams and senior management on functional and cross functional areas of impact and alignment.
  • Manage and mitigate risks through assessment, in support of the control and governance agenda.
  • Demonstrate leadership and accountability for managing risk and strengthening controls in relation to the work your team does.
  • Demonstrate comprehensive understanding of the organisation functions to contribute to achieving the goals of the business.
  • Collaborate with other areas of work, for business aligned support areas to keep up to speed with business activity and the business strategies.
  • Create solutions based on sophisticated analytical thought comparing and selecting complex alternatives. In-depth analysis with interpretative thinking will be required to define problems and develop innovative solutions.
  • Adopt and include the outcomes of extensive research in problem solving processes.
  • Seek out, build and maintain trusting relationships and partnerships with internal and external stakeholders in order to accomplish key business objectives, using influencing and negotiating skills to achieve outcomes.

All colleagues will be expected to demonstrate the Barclays Values of Respect, Integrity, Service, Excellence and Stewardship – our moral compass, helping us do what we believe is right. They will also be expected to demonstrate the Barclays Mindset – to Empower, Challenge and Drive – the operating manual for how we behave.

hackajob is partnering with Barclays to fill this position. Create a profile to be automatically considered for this role—and others that match your experience.

 

Upskill

Level up the hackajob way. Verify your skills, learn brand new ones and test your ability with Pathways, our learning and development platform.

Ready to reach your potential?