Backend Software Engineer (ML Infra)Rockstar • San Francisco, California, United States
Backend Software Engineer (ML Infra)
Rockstar
- San Francisco, California, United States
- San Francisco, California, United States
Über
Design and implement backend systems that support large-scale ML workloads, including fine-tuning and reinforcement learning. Build distributed training and inference pipelines that are efficient, fault-tolerant, and observable. Develop internal developer tools and platforms that make it easier for ML engineers to train, evaluate, and deploy models.
Cloud & Systems Engineering
Work on cloud-native systems using containers and orchestration (e.g., Kubernetes). Optimize systems for performance, reliability, and cost efficiency, especially for GPU-heavy workloads. Implement monitoring, logging, and observability for long-running training jobs and production services.
Collaborate with ML Engineers
Partner closely with ML engineers to support evolving model architectures, training workflows, and evaluation needs. Translate ML requirements into scalable backend and infrastructure solutions.
Who You Are Required
1–3 years of backend engineering experience, ideally working on production systems. Strong fundamentals in distributed systems, networking, and backend architecture. Experience building systems that scale under real load. Comfortable working in Python and/or Go (or similar backend languages). Excited to work on-site in San Francisco with a fast-moving early-stage team.
Strongly Preferred
Experience with or exposure to ML infrastructure or ML platforms. Familiarity with GPU workloads, training pipelines, or inference systems. Experience with containerization and orchestration (Docker, Kubernetes). Contributions to or deep familiarity with ML infrastructure libraries such as:
Ray vLLM SGLang or similar distributed ML systems
Bonus
Computer science background from a top-tier program or equivalent demonstrated excellence. Open-source contributions, research projects, or side projects in systems or ML infrastructure. A track record of high ownership and technical curiosity. #J-18808-Ljbffr
Sprachkenntnisse
- English
Hinweis für Nutzer
Dieses Stellenangebot stammt von einer Partnerplattform von TieTalent. Klick auf „Jetzt Bewerben”, um deine Bewerbung direkt auf deren Website einzureichen.