Sourcing as a channel, not a feature.

CUDA Engineering Expert (Train AI Models Part Time!)

Remote
Any

CUDA Engineering Expert (Train AI Models Part Time!)

Mercor
Remote
Any
Mercor

hackajob is partnering with Mercor to fill this position. Create a profile to be automatically considered for this role—and others that match your experience.

 

## **1\. Role Overview** Mercor is seeking GPU kernel optimization experts to contribute to a project with a leading AI lab. This opportunity is designed for freelancers with strong C++ skills, practical GPU programming experience, and the ability to improve kernel performance using profiler-guided analysis. You’ll help evaluate, optimize, and reason about GPU kernels across modern hardware environments. This is a contract-based opportunity for specialists who enjoy squeezing performance out of modern GPU architectures. ## **2\. Key Responsibilities** - Analyze and optimize GPU kernels for performance, efficiency, and hardware utilization - Use profiler metrics such as L2 cache hit rate, L2 throughput, occupancy, and related signals to guide kernel improvements - Review GPU kernel implementations and identify bottlenecks without requiring extensive background in the underlying algorithms - Write, modify, and reason about C++17, Python, and GPU programming code - Apply CUDA, HIP, shader programming, or related kernel programming expertise to improve performance outcomes - Document optimization decisions clearly, including when specific profiler metrics are or are not useful ## **3\. Ideal Qualifications** - Available to work at least 20 hrs/wk - Fluent in core C++ features through C++17 - Working knowledge of Python and Git - Fluent in at least one GPU programming model, such as CUDA, HIP, Slang, HLSL, GLSL, or related kernel programming - At least 1 year of professional or graduate-level research experience working with GPUs - Strong understanding of GPU profiler performance metrics and how to use them to optimize kernels - Ability to optimize GPU kernels without needing deep prior context on every algorithm - Experience with CUDA, HIP, CUDA C++ Core Libraries, inline PTX assembly, or tensor core-level optimization is a plus - Experience optimizing kernels for NVIDIA Blackwell hardware is a plus - Familiarity with NSight Compute is a plus - Prior experience with GPU hardware organizations such as NVIDIA, AMD, or Qualcomm is a plus - Open-source contributions related to GPU kernel optimization are a plus ## **4\. Application Process** - Submit your resume or relevant technical background to get started - Qualified applicants may be asked to complete a brief technical assessment or submit additional information

hackajob is partnering with Mercor to fill this position. Create a profile to be automatically considered for this role—and others that match your experience.

 

Upskill

Level up the hackajob way. Verify your skills, learn brand new ones and test your ability with Pathways, our learning and development platform.

Ready to reach your potential?