Save time and effort sourcing top tech talent

Machine Learning Evaluator

Remote
Up to $400,000/ year
Artificial Intelligence Engineer Python Developer Machine Learning Engineer Full Stack Python Developer
Actively hiring

Machine Learning Evaluator

Moody's Corporation
Remote
Up to $400,000/ year
Artificial Intelligence Engineer Python Developer Machine Learning Engineer Full Stack Python Developer
Moody's Corporation
Actively hiring

hackajob is partnering with Moody's Corporation to fill this position. Create a profile to be automatically considered for this role—and others that match your experience.

 

At Moody's, we unite the brightest minds to turn today’s risks into tomorrow’s opportunities. We do this by striving to create an inclusive environment where everyone feels welcome to be who they are—with the freedom to exchange ideas, think innovatively, and listen to each other and customers in meaningful ways. Moody’s is transforming how the world sees risk. As a global leader in ratings and integrated risk assessment, we’re advancing AI to move from insight to action—enabling intelligence that not only understands complexity but responds to it. We decode risk to unlock opportunity, helping our clients navigate uncertainty with clarity, speed, and confidence.

If you are excited about this opportunity but do not meet every single requirement, please apply! You still may be a great fit for this role or other open roles. We are seeking candidates who model our values: invest in every relationship, lead with curiosity, champion diverse perspectives, turn inputs into actions, and uphold trust through integrity. 

Skills and Competencies

  • Ph.D. in Computer Science, Machine Learning, Natural Language Processing, Statistics, or a related quantitative field; or Master’s degree with 2-3 years of experience in machine learning evaluation or a related area
  • Strong foundations in statistical methods, experimental design, and hypothesis testing
  • Experience evaluating machine learning or NLP models, including designing experiments and interpreting results
  • Familiarity with LLM evaluation benchmarks and methodologies
  • Strong programming skills in Python or R
  • Excellent communication skills in English (both written and verbal)

Preferred:

  • Experience evaluating LLMs or generative AI systems
  • Experience with production machine learning systems
  • Exposure to cloud platforms such as AWS, GCP, or Azure
  • Publications or demonstrated work in model evaluation, benchmarking, or related areas

Education

  • Ph.D. in Computer Science, Machine Learning, Natural Language Processing, Statistics, or a related quantitative field; or Master’s degree with 2-3 years of experience in machine learning evaluation or a related area

Responsibilities

  • Evaluate and validate large language models for production-grade analytical and decision-support systems
  • Design and implement evaluation frameworks for assessing LLM performance in credit analytics and decision-support contexts
  • Develop metrics and benchmarks to measure model robustness, reliability, consistency, and output quality
  • Analyze model behavior across diverse inputs, identifying failure modes, edge cases, and areas for improvement
  • Collaborate with model development and deployment teams to integrate validation processes into the model lifecycle
  • Conduct systematic assessments of model stability over time and across updates
  • Evaluate model outputs for bias, fairness, and economic relevance to credit risk applications
  • Develop and maintain documentation for evaluation methodologies, findings, and recommendations
  • Contribute to the advancement of best practices for LLM evaluation within the Credit COE

About the Team
Our Credit Center of Excellence (COE) team is responsible for maintaining and enhancing our industry-leading credit analytics and predictive modelling capabilities. We work closely with various teams including product management, commercial strategy, and go-to-market leaders to ensure the delivery of high-quality credit risk assessments and solutions. By joining our team, you will be part of exciting work in credit analytics with a global team spread across all US time zones, GMT, and GMT+1.

hackajob is partnering with Moody's Corporation to fill this position. Create a profile to be automatically considered for this role—and others that match your experience.

 

Upskill

Level up the hackajob way. Verify your skills, learn brand new ones and test your ability with Pathways, our learning and development platform.

Ready to reach your potential?