Sign up for the chance to get matched to this role, and similar opportunities.
Machine Learning Platform Engineer - Stealth AI Startup
About them
They are a stealth-mode startup developing cutting-edge AI and machine learning tools for the financial sector. Their mission is to revolutionize how hedge funds leverage advanced technologies for data analysis and decision-making. They're building a diverse team of experts from various fields to create innovative solutions that push the boundaries of what's possible in financial technology.
The role
They're seeking an ML Platform Engineer to join their founding team. You'll work directly with their AI Research team to build and optimize their on-premises ML infrastructure. This is a unique opportunity to shape the foundation of their ML platform from the ground up, with a focus on high-performance, secure computing environments.
What you’ll do:
• Design and implement scalable, on-premises infrastructure for training and deploying ML models across GPU clusters
• Build and maintain high-performance computing environments optimized for ML workloads
• Develop secure, robust data pipelines that can handle high-throughput, real-time processing requirements
• Create comprehensive monitoring and observability solutions for their distributed ML systems
• Implement testing frameworks and development workflows that accelerate their research team's productivity
• Collaborate closely with research scientists to translate innovative ideas into production-ready systems
• Make critical architectural decisions that will shape their technical infrastructure
• Design and implement security measures to protect proprietary systems and data
Requirements
• 5+ years of software engineering experience, with 3+ years focused on ML infrastructure
• Strong programming skills in Python and experience with ML frameworks (PyTorch, TensorFlow)
• Experience building and maintaining on-premises ML infrastructure and GPU clusters
• Proven track record of optimizing distributed computing systems
• Deep understanding of ML ops, including experiment tracking, model versioning, and deployment
• Expertise in designing and implementing monitoring and observability solutions
• Strong background in software engineering best practices, including testing and CI/CD
Preferred Qualifications
• Experience with high-performance computing infrastructure and GPU optimization
• Knowledge of Linux system administration and networking
• Background in security best practices for ML systems and data protection
• Experience with containerization and orchestration (Docker, Kubernetes)
• Track record of building developer tools and improving engineering productivity
• Experience collaborating with research scientists and PhD-level practitioners
• Familiarity with low-latency systems design
Sign up for the chance to get matched to this role, and similar opportunities.
Level up the hackajob way. Verify your skills, learn brand new ones and test your ability with Pathways, our learning and development platform.