Remote CUDA Kernel Optimizer - ML Engineer - AI Trainer ($120-$250 per hour)Mercor • Raleigh, North Carolina, United States

Cette offre d'emploi n'est plus disponible

Remote CUDA Kernel Optimizer - ML Engineer - AI Trainer ($120-$250 per hour)

Mercor

Raleigh, North Carolina, United States

Raleigh, North Carolina, United States

Trouver des emplois similaires

À propos

### **1) Role Overview** Mercor is engaging advanced CUDA experts who specialize in GPU kernel optimization, performance profiling, and numerical efficiency. These professionals possess a deep mental model of how modern GPU architectures execute deep learning workloads. They are comfortable translating algorithmic concepts into finely tuned kernels that maximize throughput while maintaining correctness and reproducibility, ### **2) Key Responsibilities** - Develop, tune, and benchmark CUDA kernels for tensor and operator workloads. - Optimize for occupancy, memory coalescing, instruction-level parallelism, and warp scheduling. - Profile and diagnose performance bottlenecks using Nsight Systems, Nsight Compute, and comparable tools. - Report performance metrics, analyze speedups, and propose architectural improvements. - Collaborate asynchronously with PyTorch Operator Specialists to integrate kernels into production frameworks. - Produce well-documented, reproducible benchmarks and performance write-ups. ### **3) Ideal Qualifications** - Deep expertise in CUDA programming, GPU architecture, and memory optimization. - Proven ability to achieve quantifiable performance improvements across hardware generations. - Proficiency with mixed precision, Tensor Core usage, and low-level numerical stability considerations. - Familiarity with frameworks like PyTorch, TensorFlow, or Triton (not required but beneficial). - Strong communication skills and independent problem-solving ability. - Demonstrated open-source, research, or performance benchmarking contributions. ### **4) More About the Opportunity** - Ideal for independent contractors who thrive in performance-critical, systems-level work. - Engagements focus on measurable, high-impact kernel optimizations and scalability studies. - Work is fully remote and asynchronous; deliverables are outcome-driven. - Access to shared benchmarking infrastructure and reproducibility tooling via Mercor support resources. ### **5) Compensation & Contract Terms** - Typical range: **$120–$250/hour**, depending on scope, specialization, and results achieved. Payments will be based on accepted task output over flat hourly. - Structured as a **contract-based engagement**, not an employment relationship. - Compensation tied to measurable deliverables or agreed milestones. - Confidentiality, IP, and NDA terms as defined per engagement. ### **6) Application Process** - Submit a brief overview of prior CUDA optimization experience, profiling results, or performance reports. - Include links to relevant GitHub repos, papers, or benchmarks if available. - Indicate your hourly rate, time availability, and preferred engagement length. - Selected experts may complete a small, paid pilot kernel optimization project ### **7) About Mercor** - **Mercor** connects domain experts with top AI research and technology organizations through project-based contracts. - Contractors operate independently, with full flexibility over methods, timelines, and tools. - Our mission is to help top engineers and researchers access frontier technical work without rigid employment structures.

Raleigh, North Carolina, United States

Compétences linguistiques

English

Avis aux utilisateurs

Cette offre a été publiée par l’un de nos partenaires. Vous pouvez consulter l’offre originale ici.

Trouver des emplois similaires