Forward Deployed Data Engineer

Mecka AI

United States

United States

About

About Mecka AI
Mecka AI is building the data infrastructure layer for robotics and embodied AI.
We partner with leading AI labs and robotics companies to deliver high-quality, real-world datasets used to train, evaluate, and deploy robotic systems - where model performance is dictated by data quality.
The Role
We are hiring a
Forward Deployed Data Engineer
to operate on the frontier with customers: take messy, real-world capture data - much of it raw video - and turn it into beautiful, reliable, model-ready datasets, while owning the technical relationship end-to-end.
This is a senior, high-trust role with significant autonomy. You'll combine data engineering, hands-on analysis, and product judgment to deliver datasets customers can train and ship on - and to make our delivery systems more reliable every time you do.
What You'll Work On Customer Delivery & Technical Ownership Own the end-to-end delivery of customer datasets: requirements, validation, iteration, final handoff. Be the technical point of contact: communicate clearly, set expectations, and close loops. Turn one-off customer needs into durable internal improvements - tooling, pipelines, and standards that make every future delivery faster and safer. Data Systems & Pipelines Build, debug, and harden data pipelines across ingestion, transformation, QA, and export. Work fluently across storage and database paradigms (SQL + NoSQL + object storage) and pick the right tool for the job. Establish reliable dataset "contracts": schemas, versioning, provenance, and reproducible builds - so every dataset has a clear source of truth. Dataset Quality & Signal Define and measure what makes a dataset good for a given task: coverage, diversity, balance, label fidelity, and fitness for the customer's model. Build quality scorecards and coverage/diversity reports that make dataset health legible to customers and internal teams. Query and slice large corpora to maximize customer fit - surface exactly the data that matches a target distribution, not just bulk volume. When the signal a customer needs is missing or weak in the raw video, diagnose it and partner with the perception/ML pipeline teams to extract or improve it upstream. Who You Are Required Background 5+ years in data engineering and/or backend engineering (or equivalent impact). Strong experience with large data systems, pipelines, and analytical workflows. Strong SQL proficiency and comfort across multiple database/storage paradigms. Excellent engineering judgment and debugging ability in production systems. Genuine data taste - you can look at a dataset and reason about whether it's complete, balanced, and trustworthy, not just whether the job ran. Strong Signals You've owned high-stakes customer deliveries with autonomy and trust. You can translate ambiguous requirements into crisp dataset specs and execution plans. You have strong product instincts and care about polish: "would I trust this dataset?" You're comfortable working with unstructured, real-world data - especially video. Nice to Have Working literacy in video understanding, embeddings, and encoders - enough to reason about what a dataset teaches a model and where signal is missing. Experience building data-quality, coverage, or diversity tooling. Background adjacent to ML, computer vision, or robotics data. Why This Role Own the customer-facing delivery loop for world-class robotics datasets. High autonomy, high trust, and direct impact on customer success and revenue. Work across the full stack of the problem: data, pipelines, analysis, and delivery quality. Sit at the exact point where raw, messy, real-world data becomes the thing that makes embodied-AI models work.

United States

Languages

English

Notice for Users

This job was posted by one of our partners. You can view the original job source here.

Find similar jobs