Dieses Stellenangebot ist nicht mehr verfügbar
Über
About The Role We’re looking for a Machine Learning Engineer to contribute to high-performance distributed training infrastructure for RL at scale. You’ll work directly with our founding team and design partners to push the boundaries of what’s possible with post‑training and continual learning systems. This role requires expertise in RL algorithms, distributed training, and low‑level optimization. You’ll have exceptional agency to make impactful decisions while working in a fast‑paced, customer‑driven environment.
Responsibilities
Distributed Training Infrastructure: implement new RL algorithms and build scalable post‑training pipelines
Resource Management & Optimization: design infrastructure systems for efficient GPU utilization and dynamic resource allocation
Customer‑Facing Work: work directly with customers on production deployments and custom model development
Technology
Backend: Python FastAPI, Golang
Frontend: React, TypeScript, Next.js
Cloud Infrastructure: AWS Fargate, Docker, Kubernetes, AWS SageMaker
ML Frameworks: Verl / slime / Megatron‑LM / SkyRL, PyTorch (FSDP experience is a plus), vLLM / SGLang
Databases: DynamoDB, S3
#J-18808-Ljbffr
Sprachkenntnisse
- English
Hinweis für Nutzer
Dieses Stellenangebot wurde von einem unserer Partner veröffentlicht. Sie können das Originalangebot einsehen hier.