Position Overview
We are looking for an experienced Site Reliability Engineer to support and improve the reliability, scalability, and operational performance of critical data-driven platforms.
The role focuses on Kubernetes-based environments, observability, incident response, and automation, working across engineering and platform teams to ensure systems are stable, resilient, and continuously improving.
You’ll be working in fast-paced production environments where reliability, monitoring, and rapid problem resolution are key.
What you’ll do
- Support and maintain highly available production platforms across cloud and containerised environments
- Manage Kubernetes clusters and Helm deployments
- Improve monitoring, alerting, logging, and observability systems
- Investigate incidents, perform log analysis, and drive root cause resolution
- Participate in incident response and post-incident reviews
- Automate repetitive operational tasks to improve efficiency
- Collaborate with engineering and platform teams to improve resilience and scalability
- Maintain runbooks, documentation, and operational procedures
- Support deployment, release, and change management activities
- Contribute to continuous service improvement and reliability initiatives
What you’ll need
- Strong experience in SRE, DevOps, Platform Engineering, or Production Support
- Hands-on experience with Kubernetes and Helm in production
- Strong experience with ELK stack (Elasticsearch, Logstash, Kibana)
- Solid incident management, troubleshooting, and root cause analysis skills
- Strong understanding of core SRE principles:
- Monitoring & alerting
- Incident response
- Observability
- Automation
- Production support
- Experience working with data or analytics platforms (advantageous)
- Scripting experience (Python, Bash, etc.) desirable
- Exposure to CI/CD, IaC, and cloud-native tooling beneficial
- Strong communication and stakeholder management skills
- Ability to work under pressure in fast-moving environments
Security & working requirements
- Must work onsite 5 days per week
- Must hold current HLC clearance (mandatory)
- Experience in government, defence, or highly regulated environments is highly desirable
- Only candidates meeting clearance requirements will be considered
hackajob is partnering with CGI to fill this position. Create a profile to be automatically considered for this role—and others that match your experience.