Dieses Stellenangebot ist nicht mehr verfügbar
Senior Principal Engineer – Machine Learning Infrastructure
- United States
- United States
Über
- Role
: Senior Principal Engineer – Machine Learning Infrastructure - Location:
Remote U.S. Only - Employment Type:
Full-time
Mission Summary
We are hiring for
one of our clients
a
Senior Principal Engineer, Machine Learning Infrastructure
to lead the technical vision and architecture for systems that power the full machine learning lifecycle — from training dataset generation to model training, evaluation, and deployment.
This is a
mission-critical leadership role
within the ML Infrastructure organization. You will shape platforms that process
terabytes of daily sensor data and petabyte-scale datasets
, supporting large-scale autonomous systems and advanced machine learning workflows.
This role is ideal for a senior technologist with deep expertise in
ML systems, distributed data platforms, and performance-oriented infrastructure
, who thrives on building for scale, reliability, and engineering excellence.
What You'll Do
- Own and evolve the
architecture of enterprise-scale ML infrastructure
, enabling scalable storage, curation, and access for 100+ engineers and researchers - Design infrastructure supporting
petabyte-scale ML workflows
, including multimodal perception data, simulation outputs, synthetic data, and continuous training pipelines - Architect
high-throughput distributed training systems
on large GPU clusters, improving utilization, throughput, and job efficiency - Establish robust
data governance, observability, lineage, and retention strategies
to ensure compliance, reproducibility, and long-term usability - Collaborate cross-functionally with ML engineers, data engineers, platform teams, and DevOps to tightly align infrastructure with user workflows
- Define and drive the
technical roadmap and long-term strategy
for ML infrastructure, incorporating industry best practices and open-source innovation - Mentor and influence engineers across teams, promoting excellence in distributed systems, ML platforms, and large-scale data management
What We're Looking For
- 15+ years of meaningful software engineering experience, including
architecture-level ownership of ML or data infrastructure - Proven experience designing and operating
ML platforms supporting large-scale training and inference workloads - Deep expertise in
distributed storage systems, high-volume data pipelines, and ML-oriented data compression strategies - Strong proficiency with
Linux systems, Python
, and
C++ or other performance-oriented languages - Experience operating in
hybrid environments
, including bare metal, HPC, and public cloud platforms (AWS, GCP, or Azure) - Demonstrated ability to lead
cross-organization initiatives
and influence system-level design across platform and ML teams - Prior experience in
robotics, autonomous systems, or safety-critical domains
is strongly preferred
Bonus Points
- Experience building or leading infrastructure at a
top-tier ML, AI, or autonomous systems organization - Contributions to
open-source ML or data infrastructure projects
Compensation & Benefits
Base Salary Range (U.S.):
$202,000 – $290,000 USD
Compensation is based on experience, location, and role scope. Additional compensation may include bonus and/or equity.
A comprehensive benefits package may include medical, dental, vision, retirement plans, paid time off, and additional competitive offerings.
Equal Opportunity
Our client is an
Equal Opportunity Employer
committed to building a diverse and inclusive workplace. Employment eligibility verification may be required in accordance with applicable laws.
Sprachkenntnisse
- English
Dieses Stellenangebot wurde von einem unserer Partner veröffentlicht. Sie können das Originalangebot einsehen hier.