Save time and effort sourcing top tech talent

Machine Learning Platform Engineer

Remote
Machine Learning Engineer Platform Engineer
hackajob on-demand
Actively hiring

Sign up for the chance to get matched to this role, and similar opportunities.

hackajob on-Demand is currently partnering with an AI startup company to help them hire the best talent. At on-demand, we match and speak with exceptional talent like you and provide insights into the problem they are looking to solve and the interview process.

Role: Machine Learning Platform Engineer

Opportunity: Perm or Contract

Based: London or New York (remote possible but ideally onsite in either city)

About us

We are a stealth-mode startup developing cutting-edge AI and machine learning tools for the financial sector. Our mission is to revolutionize how hedge funds leverage advanced technologies for data analysis and decision-making. We're building a diverse team of experts from various fields to create innovative solutions that push the boundaries of what's possible in financial technology.

The role

We're seeking an ML Platform Engineer to join our founding team. You'll work directly with our AI Research team to build and optimize our on-premises ML infrastructure. This is a unique opportunity to shape the foundation of our ML platform from the ground up, with a focus on high-performance, secure computing environments.

What you’ll do:

  • Design and implement scalable, on-premises infrastructure for training and deploying ML models across GPU clusters 

  • Build and maintain high-performance computing environments optimized for ML workloads 

  • Develop secure, robust data pipelines that can handle high-throughput, real-time processing requirements 

  • Create comprehensive monitoring and observability solutions for our distributed ML systems 

  • Implement testing frameworks and development workflows that accelerate our research team's productivity 

  • Collaborate closely with research scientists to translate innovative ideas into production-ready systems 

  • Make critical architectural decisions that will shape our technical infrastructure 

  • Design and implement security measures to protect proprietary systems and data

Requirements

  • 5+ years of software engineering experience, with 3+ years focused on ML infrastructure 

  • Strong programming skills in Python and experience with ML frameworks (PyTorch, TensorFlow) 

  • Experience building and maintaining on-premises ML infrastructure and GPU clusters 

  • Proven track record of optimizing distributed computing systems 

  • Deep understanding of ML ops, including experiment tracking, model versioning, and deployment 

  • Expertise in designing and implementing monitoring and observability solutions 

  • Strong background in software engineering best practices, including testing and CI/CD

Preferred Qualifications

  • Experience with high-performance computing infrastructure and GPU optimization 

  • Knowledge of Linux system administration and networking 

  • Background in security best practices for ML systems and data protection 

  • Experience with containerization and orchestration (Docker, Kubernetes) 

  • Track record of building developer tools and improving engineering productivity 

  • Experience collaborating with research scientists and PhD-level practitioners 

  • Familiarity with low-latency systems design

 

Sign up for the chance to get matched to this role, and similar opportunities.

Upskill

Level up the hackajob way. Verify your skills, learn brand new ones and test your ability with Pathways, our learning and development platform.

Ready to reach your potential?