Site Reliability Engineer (SRE)
Position Overview
We are looking for an experienced Site Reliability Engineer to support and improve the reliability, scalability, and performance of critical data-driven platforms across complex production environments.
The role focuses on strengthening observability, incident response, automation, and platform stability across Kubernetes-based, cloud-native systems.
You’ll work closely with engineering, platform, and support teams to ensure services are highly available, well-monitored, and continuously improving.
What you’ll do
- Support and maintain highly available production platforms and services
- Manage Kubernetes clusters and Helm-based deployments
- Improve monitoring, alerting, logging, and observability systems
- Investigate incidents and perform root cause analysis
- Participate in incident response and post-incident reviews
- Automate operational tasks to reduce manual effort
- Collaborate with engineering teams to improve resilience and scalability
- Maintain runbooks, operational documentation, and support guides
- Support deployment, release, and change management processes
- Contribute to continuous reliability and performance improvements
What you’ll need
- Strong experience in SRE, DevOps, Platform Engineering, or Production Support
- Hands-on experience with Kubernetes and Helm
- Experience supporting mission-critical production systems
- Strong experience with ELK stack (Elasticsearch, Logstash, Kibana)
- Strong troubleshooting, incident management, and RCA skills
- Understanding of core SRE practices:
- Monitoring and alerting
- Incident response
- Root cause analysis
- Automation
- Production support and reliability engineering
- Experience working in fast-paced operational environments
- Strong communication and stakeholder management skills
Nice to have
- Experience with data or analytics platforms
- Scripting skills (Python, Bash, etc.)
- CI/CD pipelines and Infrastructure as Code
- Cloud-native platform experience
Security & working requirements
- Must work onsite 5 days per week
- Must hold current HLC clearance (mandatory)
- Experience in government, defence, or regulated environments is highly desirable
- Only candidates meeting clearance criteria will be considered
hackajob is partnering with CGI to fill this position. Create a profile to be automatically considered for this role—and others that match your experience.