Human Baseliner for Open-Ended ML Research Tasks (Train AI Models Part Time!)

Remote

Machine Learning Engineer Research Scientist AI Researcher

Human Baseliner for Open-Ended ML Research Tasks (Train AI Models Part Time!)

Mercor

Remote

Machine Learning Engineer Research Scientist AI Researcher

Mercor

hackajob is partnering with Mercor to fill this position. Create a profile to be automatically considered for this role—and others that match your experience.

## Overview We are hiring experienced machine learning engineers and researchers to serve as **human baseliners** for evaluations of open-ended machine learning research tasks. These evaluations measure how well AI agents perform on realistic AI R&D problems. To interpret agent performance, we also need strong human reference points: skilled practitioners attempting the same tasks under the same time and compute constraints. As a baseliner, you will complete self-contained ML research tasks in a sandboxed environment, working independently with your preferred tools and workflow. Your performance will be used as a benchmark against which frontier-model agents are evaluated. ## What You’ll Do - Attempt open-ended machine learning research tasks under a fixed time and compute budget (work trial) - Work independently in a sandboxed Linux environment with internet access - Use your preferred tooling, including IDEs and AI coding assistants such as Cursor, Claude Code, and ChatGPT - Record your full working session via screen recording - Complete a short pre-task and post-task questionnaire - Submit your final work product, screen recording, and completed questionnaires: Post this you will be hired for a longer commitment ## Commitment - Minimum **20 hours per week if selected** - More availability is strongly preferred ## Requirements Candidates must meet **all** of the following: - **3+ years of machine learning experience** - Time spent in a PhD program counts toward this requirement - Undergraduate and master’s experience does not count - Attended a **top-100 university** or worked at **FAANG or a comparable company** - Experience with at least one major ML framework such as **PyTorch, JAX, or TensorFlow** - Deep, hands-on expertise in at least one of the focus areas below: - Pretraining under tight data and compute budgets - PPO, reward shaping, custom `gym` / `gymnasium` environments, and throughput tuning - Full fine-tuning, LoRA, QLoRA, DPO, RLHF, RLAIF, and distillation - Large-scale corpus filtering, deduplication, subsampling, and benchmark contamination avoidance - Architecture design under strict parameter-count or size constraints - Modifying pretrained architectures, including attention patterns, pooling heads, or training objectives - Contrastive training for embedding or retrieval models - Generative vision or video modeling - Multilingual or low-resource language experience - Image or video data pipelines at scale - Experience balancing competing model objectives such as safety and capability - Prior work as an ML evaluator, red-teamer, or baseliner ## Required Domain Expertise Candidates must have strong practical experience in **at least one** of the following: - **Pretraining**: training transformer language models from scratch - **Reinforcement learning**: training agents in custom or existing environments - **Post-training**: fine-tuning and aligning LLMs - **Dataset curation**: building and cleaning large text corpora for LLM training - **Model architecture**: designing and modifying neural network architectures ## Logistics (work trial requirements) - One baseline attempt per contractor per task - Each task may only be attempted once by a given contractor - All work is confidential and covered by NDA - Compute and environment are provided; no personal GPU is required

hackajob is partnering with Mercor to fill this position. Create a profile to be automatically considered for this role—and others that match your experience.

Upskill

Level up the hackajob way. Verify your skills, learn brand new ones and test your ability with Pathways, our learning and development platform.

Find out more

Ready to reach your potential?

Find out more

Platform

Customers

Solutions

Resources

Human Baseliner for Open-Ended ML Research Tasks (Train AI Models Part Time!)

Remote

Human Baseliner for Open-Ended ML Research Tasks (Train AI Models Part Time!)

Mercor

Remote

Mercor

Upskill

Ready to reach your potential?