hackajob is partnering with BMC Software to fill this position. Create a profile to be automatically considered for this role—and others that match your experience.
We are looking for a Principal DevOps Engineer to help build and operate our next-generation Agentic-AI Data Management platform from 0-1. This is a hands-on, delivery-focused role for a senior/principal engineer who thrives in early-stage environments, owns reliability and automation end-to-end, and takes pride in running production systems used by external enterprise customers.
You will work alongside an established architect and senior product engineers, shape platform and operational architecture, and spend a significant portion of your time designing, building, and evolving the cloud, CI/CD, and runtime foundations of the product.
Here is how, through this exciting role, YOU will contribute to BMC's and your own success:
• Platform & DevOps Engineering (Primary Focus) – Design, build, and operate the core cloud and Kubernetes-based platform that underpins a 0-1 data automation and management product, taking infrastructure and operational capabilities from concept through production.
• Hands-on Automation – Write production-grade automation in Python, Go, or similar languages to eliminate manual work across provisioning, deployment, scaling, monitoring, and incident response.
• Cloud & Kubernetes Architecture – Design and evolve Kubernetes-based platforms using Docker, Helm, and cloud-native services, balancing speed of delivery with long-term operability and cost control.
• SRE & Reliability Practices – Establish and enforce SRE best practices including SLIs/SLOs, alerting strategies, error budgets, incident management, and post-incident reviews to ensure enterprise-grade reliability.
• CI/CD & Release Engineering – Build and maintain robust CI/CD pipelines (e.g., GitHub Actions, Jenkins) to support frequent, safe, and repeatable deployments across multiple environments.
• Security & Compliance Enablement – Manage cloud environments in accordance with company security guidelines, embedding security, compliance, and access controls directly into infrastructure and pipelines.
• Operational Tooling – Build and maintain internal tools, services, and automation that support deployment, observability, debugging, and operational excellence while reducing human error.
• Integration & Cloud Enablement – Support deployments across AWS including integrations with enterprise systems and geographically redundant, highly available services.
• Product & Engineering Collaboration – Work closely with product engineering teams to design operable systems, influence architectural decisions, and ensure production realities inform development choices early.
• Founder-Level Ownership – Act with strong ownership: identify operational gaps, propose pragmatic solutions, and move work forward without waiting for perfect requirements or ideal conditions.
To ensure you’re set up for success, you will bring the following skillset & experience:
• 10+ years of professional engineering experience, including building, deploying, and operating enterprise B2B systems in production.
• Strong experience designing and operating cloud-native platforms on one or more major cloud providers (AWS, Azure, GCP).
• Deep hands-on experience with Kubernetes, Docker, Helm, and microservice-based architectures in real production environments.
• Strong automation skills using Python, Go, or similar languages, with a bias toward eliminating manual operational work.
• Extensive experience with CI/CD pipelines, source control (Git/GitHub), and release engineering practices.
• Strong Linux/Unix systems knowledge and experience operating distributed systems at scale.
• Solid understanding of networking, security, and cloud infrastructure fundamentals.
• Experience with observability tooling (metrics, logging, tracing) and production debugging.
• Comfort operating in ambiguous, startup-style environments where DevOps engineers are expected to lead, not just support.
• Familiarity with configuration management and automation tools (e.g., Puppet, Chef, or modern equivalents).
Whilst these are nice to have, our team can help you develop in the following skills:
· Experience operating large-scale data platforms and data orchestration systems from an SRE/DevOps perspective.
· Familiarity with AI/ML-enabled platforms, including how LLM-driven systems impact reliability, cost, and observability.
· Strong exposure to open-source technologies and tooling across the DevOps ecosystem.
hackajob is partnering with BMC Software to fill this position. Create a profile to be automatically considered for this role—and others that match your experience.
Level up the hackajob way. Verify your skills, learn brand new ones and test your ability with Pathways, our learning and development platform.