Forward Deployed Data Engineer

Mecka AI

United States

United States

Postuler Maintenant

À propos

About Mecka AI
Mecka AI is building the data infrastructure layer for robotics and embodied AI.
We partner with leading AI labs and robotics companies to deliver high-quality, real-world datasets used to train, evaluate, and deploy robotic systems - where model performance is dictated by data quality.
The Role
We are hiring a
Forward Deployed Data Engineer
to operate on the frontier with customers: take messy, real-world capture data - much of it raw video - and turn it into beautiful, reliable, model-ready datasets, while owning the technical relationship end-to-end.
This is a senior, high-trust role with significant autonomy. You'll combine data engineering, hands-on analysis, and product judgment to deliver datasets customers can train and ship on - and to make our delivery systems more reliable every time you do.
What You'll Work On Customer Delivery & Technical Ownership Own the end-to-end delivery of customer datasets: requirements, validation, iteration, final handoff. Be the technical point of contact: communicate clearly, set expectations, and close loops. Turn one-off customer needs into durable internal improvements - tooling, pipelines, and standards that make every future delivery faster and safer. Data Systems & Pipelines Build, debug, and harden data pipelines across ingestion, transformation, QA, and export. Work fluently across storage and database paradigms (SQL + NoSQL + object storage) and pick the right tool for the job. Establish reliable dataset "contracts": schemas, versioning, provenance, and reproducible builds - so every dataset has a clear source of truth. Dataset Quality & Signal Define and measure what makes a dataset good for a given task: coverage, diversity, balance, label fidelity, and fitness for the customer's model. Build quality scorecards and coverage/diversity reports that make dataset health legible to customers and internal teams. Query and slice large corpora to maximize customer fit - surface exactly the data that matches a target distribution, not just bulk volume. When the signal a customer needs is missing or weak in the raw video, diagnose it and partner with the perception/ML pipeline teams to extract or improve it upstream. Who You Are Required Background 5+ years in data engineering and/or backend engineering (or equivalent impact). Strong experience with large data systems, pipelines, and analytical workflows. Strong SQL proficiency and comfort across multiple database/storage paradigms. Excellent engineering judgment and debugging ability in production systems. Genuine data taste - you can look at a dataset and reason about whether it's complete, balanced, and trustworthy, not just whether the job ran. Strong Signals You've owned high-stakes customer deliveries with autonomy and trust. You can translate ambiguous requirements into crisp dataset specs and execution plans. You have strong product instincts and care about polish: "would I trust this dataset?" You're comfortable working with unstructured, real-world data - especially video. Nice to Have Working literacy in video understanding, embeddings, and encoders - enough to reason about what a dataset teaches a model and where signal is missing. Experience building data-quality, coverage, or diversity tooling. Background adjacent to ML, computer vision, or robotics data. Why This Role Own the customer-facing delivery loop for world-class robotics datasets. High autonomy, high trust, and direct impact on customer success and revenue. Work across the full stack of the problem: data, pipelines, analysis, and delivery quality. Sit at the exact point where raw, messy, real-world data becomes the thing that makes embodied-AI models work.

United States

Compétences linguistiques

English

Avis aux utilisateurs

Cette offre provient d’une plateforme partenaire de TieTalent. Cliquez sur « Postuler maintenant » pour soumettre votre candidature directement sur leur site.

Postuler Maintenant