Senior Machine Learning Infrastructure Engineer

DeepRec.ai

United States

United States

Trouver des emplois similaires

À propos

Senior Machine Learning Infrastructure Engineer This range is provided by DeepRec.ai. Your actual pay will be based on your skills and experience – talk with your recruiter to learn more.
Base pay range $250,000.00/yr - $300,000.00/yr
Direct message the job poster from DeepRec.ai
Senior Machine Learning Infra Engineer | San Francisco | Competitive Salary + Equity
Our client is an early‑stage AI company building foundation models for physics to enable end‑to‑end industrial automation, from simulation and design through optimization, validation, and production. they are assembling a small, elite, founder‑led team focused on shipping real systems into production, backed by world‑class investors and technical advisors.
They are hiring a Machine Learning Cloud Infrastructure Engineer to own the full ML infrastructure stack behind physics‑based foundation models. Working directly with the CEO and founding team, you will build, scale, and operate production‑grade ML systems used by real customers.
What you will do
Own distributed training and fine‑tuning infrastructure across multi‑GPU and multi‑node clusters
Design and operate low‑latency, highly reliable inference and model serving systems
Build secure fine‑tuning pipelines allowing customers to adapt models to their data and workflows
Deliver deployments across cloud and on‑prem environments, including enterprise and air‑gapped setups
Design data pipelines for large‑scale simulation and CFD datasets
Implement observability, monitoring, and debugging across training, serving, and data pipelines
Work directly with customers on deployment, integration, and scaling challenges
Move quickly from prototype to production infrastructure
What our client is looking for
3+ years building and scaling ML infrastructure for training, fine‑tuning, serving, or deployment
Strong experience with AWS, GCP, or Azure
Hands‑on expertise with Kubernetes, Docker, and infrastructure‑as‑code
Experience with distributed training frameworks such as PyTorch Distributed, DeepSpeed, or Ray
Proven experience building production‑grade inference systems
Strong Python skills and deep understanding of the end‑to‑end ML lifecycle
High execution velocity, strong debugging instincts, and comfort operating in ambiguity
Nice to have
Background in physics, simulation, or computer‑aided engineering software
Experience deploying ML systems into enterprise or regulated environments
Large‑scale ML data engineering and validation pipelines
Experience at high‑growth AI startups or leading AI research labs
Customer‑facing or forward‑deployed engineering experience
Open‑source contributions to ML infrastructure
This role suits someone who earns respect through hands‑on technical contribution, thrives in intense, execution‑driven environments, values deep focused work, and takes full ownership of outcomes. The company offers ownership of core infrastructure, direct collaboration with the CEO and founding team, work on high‑impact AI and physics problems, competitive compensation with meaningful equity, an in‑person‑first culture five days a week, strong benefits, daily meals, stipends, and immigration support.
Seniority level Mid‑Senior level
Employment type Full‑time
Job function Information Technology
Industries Research Services
#J-18808-Ljbffr

United States

Compétences linguistiques

English

Avis aux utilisateurs

Cette offre a été publiée par l’un de nos partenaires. Vous pouvez consulter l’offre originale ici.

Trouver des emplois similaires