Save time and effort sourcing top tech talent

Lead Site Reliability Engineer

Nottingham, UK
DevOps Engineer Cloud Engineer Site Reliability Engineer Platform Engineer Operations Engineer
Actively hiring

Lead Site Reliability Engineer

Capital One
Nottingham, UK
DevOps Engineer Cloud Engineer Site Reliability Engineer Platform Engineer Operations Engineer
Capital One
Actively hiring

hackajob is partnering with Capital One to fill this position. Create a profile to be automatically considered for this role—and others that match your experience.

 

We are seeking a Lead Site Reliability Engineer (DataOps) to lead the charge on ensuring the health, reliability, and security of our critical data pipelines. 

This is a senior, hands on technical role for an expert who is comfortable with mission-critical batch data pipelines in a cloud environment, integrating with numerous real-time data sources. 

You will be responsible for managing highly sensitive and critical data streams and driving strategic initiatives to minimize incidents, optimize performance, and build a resilient hybrid data environment. Your focus will be on proactive problem-solving, automation, and continuous improvement, transforming our operational processes from reactive to resilient.

What you’ll do

  • Production Support & Reliability: Act as the subject matter expert and technical lead for resolving the most complex, high-impact incidents affecting data pipelines. Manage multiple stakeholders for critical events. Perform in-depth root cause analysis to prevent recurrence, focusing on data pipelines, scheduling platforms such as Control-M and AWS-related services.
  • Data Security & Governance: Ensure the integrity and security of highly sensitive and critical data throughout the entire pipeline. Implement and enforce security best practices, including managing encryption at rest and in transit, access controls, and compliance.
  • Automation & Tooling: Develop and implement automation for common operational tasks to reduce manual toil. Focus on building tools and monitoring solutions that provide visibility into the end-to-end health of pipelines.
  • Performance Optimization: Proactively analyse and tune the performance of batch schedules and AWS resource utilization. Identify and implement optimizations to improve efficiency and reduce operational costs.
  • Collaboration & Leadership: Act as a technical leader and mentor for both onsite and offshore team members. Ensure seamless collaboration, clear communication, and consistent operational standards across a distributed team. Contribute to the long-term technical strategy for data operations including modernization efforts.

What we’re looking for

  • Demonstrable hands-on experience in a production support, site reliability, or data operations role within a large-scale data environment.
  • Experience with data distribution platforms (e.g. Ab Initio & Spark centric solutions like AWS Glue & EMR), including deep understanding of ETL/ELT workflows & integration into data platforms like Snowflake.
  • Extensive experience with scheduling platforms such as Control-M, including complex scheduling, dependencies, and managing a large batch environment.
  • Working knowledge of IBM Sterling FileGateway or similar file transfer (MFT) solutions would be beneficial (e.g. AWS Transfer Family).
  • Deep knowledge of AWS and its data-related services, including knowledge of open-source, cloud-first data-pipeline orchestration capabilities like Apache Airflow.
  • Proficiency in Shell scripting & Python for automation and system administration.
  • Proven ability to manage highly sensitive and critical data pipelines, with a strong understanding of security and compliance requirements.
  • Demonstrated experience working effectively with both onsite and offshore teams, ensuring seamless operational handoffs and knowledge sharing.
  • Excellent communication skills, with the ability to articulate complex technical issues to both technical teams and business stakeholders.
  • Experience with DevOps or DataOps principles and practices is essential.

hackajob is partnering with Capital One to fill this position. Create a profile to be automatically considered for this role—and others that match your experience.

 

Upskill

Level up the hackajob way. Verify your skills, learn brand new ones and test your ability with Pathways, our learning and development platform.

Ready to reach your potential?