Founding Engineer - Robotics Data Infrastructure
Neural Motion
- United States
- United States
À propos
The full job description covers all associated skills, previous experience, and any qualifications that applicants are expected to have.
Today, robotics data is fragmented across embodiments, formats, and pipelines. This prevents models from learning shared priors and blocks the scaling we’ve seen in language and vision. Our goal is to fix this by building a universal data pipeline and cross-embodiment representation layer that unifies real-world logs, simulation, and multimodal datasets into a single, composable system. This platform will power: Public datasets and tooling
for researchers Data pipelines and sourcing infrastructure
for enterprise robotics and AI labs Cross-embodiment learning from large, real-world datasets We are looking for a
Founding Engineer
to own and drive core parts of this system—from large-scale data pipelines to embodiment-aware transformations. As one of the earliest engineers, you will be at the forefront of the mission that allows knowledge learned in one robot to be reused across many, something that revolutionizes the physical AI world. What You’ll Work On
You will operate at the intersection of
data systems, cloud infrastructure, and robotics learning .
Core Areas
Design and build
high-throughput data pipelines
for ingesting, processing, and standardizing robotics datasets Architect
distributed systems and microservices
for robotics data processing and dataset infrastructure Develop the
data compiler layer
that standardizes raw logs into a unified representation Build
cross-embodiment transformation pipelines
(retargeting, normalization, alignment) Integrate
multimodal augmentation models
(vision, language, SLAM, simulation) Enable
real ↔ sim pipelines
and unified evaluation frameworks Build tooling for: dataset ingestion & validation, annotation and enrichment, and dataset versioning and reproducibility
Product Surfaces
Public dataset platform (APIs, SDKs, data loaders) Internal pipelines for enterprise data sourcing and validation Interfaces for model training and evaluation
Technical Ownership
Helping define the architecture for robotics dataset infrastructure and pipelines Working directly with the founding team on product and technical direction
Research Collaboration
Neural Motion is actively exploring research directions in
cross-embodiment robot learning and dataset representations . xywuqvp You will collaborate with robotics researchers working on these problems and help translate research results into practical infrastructure and tooling. Integrating research outputs from robotics learning experiments into platform infrastructure Supporting experiments around cross-embodiment datasets Who You Are
We are open to two strong profiles, ideally combined: (A) Infrastructure / Distributed Systems Engineer
Experience building
large-scale data systems
(TB–PB scale) Strong background in: distributed systems streaming pipelines microservices architecture Hands-on with tools such as: Kafka / Pulsar Temporal / Airflow / workflow orchestration AWS (S3, SQS, Lambda, ECS/EKS) / GCP equivalents Experience designing
robust, fault-tolerant pipelines Strong backend engineering skills (Python, Go, or similar)
(B) Robotics / Robot Learning Engineer
Experience in
robot learning / embodied AI / manipulation Familiarity with: VLA / world models imitation learning / RL dataset design for robotics Strong understanding of: kinematics (FK/IK) retargeting across embodiments coordinate frames and calibration Experience working with: ROS / URDF simulation tools (Isaac Gym, MuJoCo, etc.) Good intuition for what makes
high-quality robotics data
Ideal Candidate
Driven by the mission to define
new, foundational infrastructure for an entire field Has experience in
both infrastructure and robotics , or has worked closely across both Thinks in
systems : not just models or pipelines, but how everything composes Cares deeply about
data quality, structure, and scalability Comfortable working in an ambiguous, fast-moving environment
Bonus Points
Experience with large multimodal datasets (video, sensor logs, etc.) Experience with dataset platforms (HuggingFace, TFDS, RLDS, etc.) Experience building internal tools for ML/data teams Exposure to simulation ↔ real transfer systems Startup or zero-to-one experience Location / Setup
San Francisco/Remote Compensation / Equity
Compensation for this role includes: • $150,000 – $220,000/yr base salary + meaningful early-stage equity
Compétences linguistiques
- English
Avis aux utilisateurs
Cette offre provient d’une plateforme partenaire de TieTalent. Cliquez sur « Postuler maintenant » pour soumettre votre candidature directement sur leur site.