Data Engineer, Machine Learning
SESAME
- San Francisco, California, United States
- San Francisco, California, United States
Über
Partner directly with ML engineers to understand data requirements for new models and experiments, and deliver datasets that meet those needs.
Build and maintain infrastructure for dataset versioning, lineage tracking, and reproducibility — so any training run can be traced back to its exact data.
Develop data quality frameworks that catch issues before they become model quality issues: schema validation, drift detection, and coverage monitoring.
Optimise large‑scale data processing for cost and performance across Sesame’s cloud infrastructure.
Build tooling that makes it easy for ML engineers and researchers to discover, explore, and request data independently.
Define and enforce data governance and privacy standards, particularly around sensitive conversational and voice data.
Contribute to architecture decisions around Sesame’s broader data platform as the team and data volume grow.
Required Qualifications: 5+ years in data engineering, with meaningful experience supporting ML or AI teams specifically.
Strong SQL and Python skills — you’ll use both daily.
Experience building and operating ETL/ELT pipelines at scale using modern data platforms and tooling.
Experience with workflow orchestration systems such as Airflow, Dagster, or Prefect.
Hands‑on experience with ML data workflows: training data pipelines, dataset versioning, data labeling pipelines, or model evaluation data.
A solid understanding of how ML teams work — you don’t need to train models; what matters is understanding what makes a good training dataset and why data quality directly affects model performance.
Comfort working with unstructured and semi‑structured data — audio, text, JSON logs — not just clean relational tables.
Strong communication skills. You’ll be embedded with ML engineers and need to bridge data systems and model requirements effectively.
Preferred Qualifications: Vector databases, embedding storage, or feature stores.
Data from hardware or embedded systems: telemetry, sensors, real‑time streams.
Distributed compute frameworks for large‑scale data processing such as Ray or Spark.
Kubernetes and managed Kubernetes environments such as GKE or EKS.
Data privacy frameworks, especially around voice or conversational data.
Building internal tooling or self‑serve data platforms.
Sesame is committed to a workplace where everyone feels valued, respected, and empowered. We welcome all qualified applicants, embracing diversity in race, gender, identity, orientation, ability, and more. We provide reasonable accommodations for applicants with disabilities. Full‑time Employee Benefits: 401 (k) max employer match: 3.5% of compensation
100% employer‑paid health, vision, and dental benefits for you and your dependents
Unlimited PTO and sick time
Flexible spending account with employer matching up to $1,650/year (medical FSA)
Guardian Employee Assistance Program (EAP)
Opportunity to share in the company’s success with competitive stock options
Benefits do not apply to contingent/contract workers.
#J-18808-Ljbffr
Sprachkenntnisse
- English
Hinweis für Nutzer
Dieses Stellenangebot stammt von einer Partnerplattform von TieTalent. Klick auf „Jetzt Bewerben”, um deine Bewerbung direkt auf deren Website einzureichen.