Save time and effort sourcing top tech talent

Site Reliability Engineer

London, UK
Full Stack Python Developer Cloud Engineer Site Reliability Engineer DevOps Engineer Java Developer Python Developer Full Stack Java Developer
Actively hiring

Site Reliability Engineer

BT
London, UK
Full Stack Python Developer Cloud Engineer Site Reliability Engineer DevOps Engineer Java Developer Python Developer Full Stack Java Developer
BT
Actively hiring

hackajob is partnering with BT to fill this position. Create a profile to be automatically considered for this role—and others that match your experience.

 

Why this job matters

As a Site Reliability Engineer (SRE), you will play a critical role in ensuring BT delivers exceptional service performance, reliability, and availability across its digital platforms. In a fast-paced, cloud-driven AI environment where customers expect seamless experiences, this position enables scalable, fault-tolerant, and cost-effective solutions through cross-team collaboration, automation, monitoring, and resilience strategies. By minimising downtime, reducing operational risk, and accelerating innovation, you will safeguard BT’s reputation for reliability while empowering the business to adapt quickly to emerging technologies and deliver consistent value to customers worldwide.

What you’ll be doing

  • Implement and optimise CI/CD pipelines, automation frameworks, and infrastructure-as-code solutions using AWS, GitOps, and container technologies.
  • Design, develop, and troubleshoot large-scale distributed systems across on-prem and cloud environments, ensuring reliability and scalability.
  • Lead performance and scale testing, monitoring, and analysis to improve system stability, security, and efficiency.
  • Drive automation initiatives to eliminate manual toil, reduce detection and resolution times, and enhance operational resilience.
  • Proactively identify and mitigate risks, perform root cause analysis, and implement preventive measures following incidents.
  • Champion best practices in Site Reliability Engineering, mentor team members, and share knowledge on emerging trends and technologies.
  • Collaborate across organisational boundaries to deliver improvements aligned with broader SRE initiatives.

Experience you'll have

Mandatory:

  • A deep understanding of full-stack monitoring solutions, such as Dynatrace, to ensure current end-to-end performance and trends of owned CDO Applications.
  • Strong proficiency in one or more programming languages (e.g. Java, Python). 
  • Experience with cloud platforms (AWS, Azure, or GCP). 
  • Solid understanding of software architecture, design patterns, and microservices. 
  • Familiarity with CI/CD tools and DevOps practices. 

Desirable:

  • AIOps fundamentals (cross-domain telemetry ingestion, event correlation, topology/context building, and remediation augmentation).
  • Agentic/autonomous observability skills (using intelligent agents to detect anomalies, correlate signals, and trigger guarded remediations to cut MTTR).
  • AI-assisted alerting & noise reduction (designing contextual, business impact aware alerts; prioritisation via ML).

hackajob is partnering with BT to fill this position. Create a profile to be automatically considered for this role—and others that match your experience.

 

Upskill

Level up the hackajob way. Verify your skills, learn brand new ones and test your ability with Pathways, our learning and development platform.

Ready to reach your potential?