Sign up for the chance to get matched to this role, and similar opportunities.
hackajob on-Demand is currently partnering with an AI startup company to help them hire the best talent. At on-demand, we match and speak with exceptional talent like you and provide insights into the problem they are looking to solve and the interview process.
Role: Machine Learning Platform Engineer
Opportunity: Perm or Contract
Based: London or New York (remote possible but ideally onsite in either city)
About us
We are a stealth-mode startup developing cutting-edge AI and machine learning tools for the financial sector. Our mission is to revolutionize how hedge funds leverage advanced technologies for data analysis and decision-making. We're building a diverse team of experts from various fields to create innovative solutions that push the boundaries of what's possible in financial technology.
The role
We're seeking an ML Platform Engineer to join our founding team. You'll work directly with our AI Research team to build and optimize our on-premises ML infrastructure. This is a unique opportunity to shape the foundation of our ML platform from the ground up, with a focus on high-performance, secure computing environments.
What you’ll do:
Design and implement scalable, on-premises infrastructure for training and deploying ML models across GPU clusters
Build and maintain high-performance computing environments optimized for ML workloads
Develop secure, robust data pipelines that can handle high-throughput, real-time processing requirements
Create comprehensive monitoring and observability solutions for our distributed ML systems
Implement testing frameworks and development workflows that accelerate our research team's productivity
Collaborate closely with research scientists to translate innovative ideas into production-ready systems
Make critical architectural decisions that will shape our technical infrastructure
Design and implement security measures to protect proprietary systems and data
Requirements
5+ years of software engineering experience, with 3+ years focused on ML infrastructure
Strong programming skills in Python and experience with ML frameworks (PyTorch, TensorFlow)
Experience building and maintaining on-premises ML infrastructure and GPU clusters
Proven track record of optimizing distributed computing systems
Deep understanding of ML ops, including experiment tracking, model versioning, and deployment
Expertise in designing and implementing monitoring and observability solutions
Strong background in software engineering best practices, including testing and CI/CD
Preferred Qualifications
Experience with high-performance computing infrastructure and GPU optimization
Knowledge of Linux system administration and networking
Background in security best practices for ML systems and data protection
Experience with containerization and orchestration (Docker, Kubernetes)
Track record of building developer tools and improving engineering productivity
Experience collaborating with research scientists and PhD-level practitioners
Familiarity with low-latency systems design
Sign up for the chance to get matched to this role, and similar opportunities.
Level up the hackajob way. Verify your skills, learn brand new ones and test your ability with Pathways, our learning and development platform.