Sourcing as a channel, not a feature.

Site Reliability Engineer

Manchester, United Kingdom
DevOps Engineer Platform Engineer Operations Engineer Site Reliability Engineer
Actively hiring

Site Reliability Engineer

CGI
Manchester, United Kingdom
DevOps Engineer Platform Engineer Operations Engineer Site Reliability Engineer
CGI
Actively hiring

hackajob is partnering with CGI to fill this position. Create a profile to be automatically considered for this role—and others that match your experience.

 

Position Overview

We are looking for an experienced Site Reliability Engineer to support and improve the reliability, scalability, and operational performance of critical data-driven platforms.

The role focuses on Kubernetes-based environments, observability, incident response, and automation, working across engineering and platform teams to ensure systems are stable, resilient, and continuously improving.

You’ll be working in fast-paced production environments where reliability, monitoring, and rapid problem resolution are key.


What you’ll do

  • Support and maintain highly available production platforms across cloud and containerised environments
  • Manage Kubernetes clusters and Helm deployments
  • Improve monitoring, alerting, logging, and observability systems
  • Investigate incidents, perform log analysis, and drive root cause resolution
  • Participate in incident response and post-incident reviews
  • Automate repetitive operational tasks to improve efficiency
  • Collaborate with engineering and platform teams to improve resilience and scalability
  • Maintain runbooks, documentation, and operational procedures
  • Support deployment, release, and change management activities
  • Contribute to continuous service improvement and reliability initiatives

What you’ll need

  • Strong experience in SRE, DevOps, Platform Engineering, or Production Support
  • Hands-on experience with Kubernetes and Helm in production
  • Strong experience with ELK stack (Elasticsearch, Logstash, Kibana)
  • Solid incident management, troubleshooting, and root cause analysis skills
  • Strong understanding of core SRE principles:
    • Monitoring & alerting
    • Incident response
    • Observability
    • Automation
    • Production support
  • Experience working with data or analytics platforms (advantageous)
  • Scripting experience (Python, Bash, etc.) desirable
  • Exposure to CI/CD, IaC, and cloud-native tooling beneficial
  • Strong communication and stakeholder management skills
  • Ability to work under pressure in fast-moving environments

Security & working requirements

  • Must work onsite 5 days per week
  • Must hold current HLC clearance (mandatory)
  • Experience in government, defence, or highly regulated environments is highly desirable
  • Only candidates meeting clearance requirements will be considered

hackajob is partnering with CGI to fill this position. Create a profile to be automatically considered for this role—and others that match your experience.

 

Upskill

Level up the hackajob way. Verify your skills, learn brand new ones and test your ability with Pathways, our learning and development platform.

Ready to reach your potential?