Data Engineer
Mach9
- San Francisco, California, United States
- San Francisco, California, United States
À propos
This role is ideal for an engineer who loves puzzle-hunting
— reverse‑engineering sparsely‑documented formats, wrangling coordinate systems and transforms, hunting down strange camera projection issues.
You’ll sit at the divide between our customers and our product, making messy real-world sensor data trustworthy at scale. This role sits at the front of everything we do: our models are only as good as the data feeding them, and you'll be the one making that data trustworthy at scale.
Responsibilities
Develop and maintain scalable, reproducible workflows for ingesting and processing large volumes of point cloud, imagery, and geospatial data.
Convert datasets from various sensor providers into Mach9's standardized internal formats.
Build CI/CD pipelines and automated checks that guarantee the correctness and consistency of data pipelines, including regression detection on dataset processing.
Optimize processing performance, query speed, and storage efficiency across large geospatial datasets.
Work closely with the customer success team to efficiently resolve issues and unblock customer projects.
Build and maintain agentic harness for automated dataset triage and code patching. Automatically propose or apply fixes, and elevate when human judgment is needed.
Work closely with ML and product teams to make data readily usable for training, inference and visualization.
Work closely with customers and data-provider partners to facilitate data integration (with occasional travels).
Puzzle-hunting: work with data formats with sparse or missing documentation.
Requirements
Strong software development, problem‑solving, and debugging skills, with hands‑on experience building production systems in Python.
Solid foundation in distributed systems and parallel computing.
Comfort operating with ambiguity — able to dig into undocumented or messy data formats, reverse‑engineer how they work, and make steady progress without a clear spec.
Experience building agentic systems and setting up agent harnesses — orchestrating LLM‑driven workflows for triage, debugging, or automated code patching.
Strong communication and collaboration skills, with the ability to work across ML, product, and customer‑facing teams.
Bachelor's degree in Computer Science, Engineering, or equivalent experience.
Bonus qualifications
Experience building agentic systems and setting up agent harnesses — orchestrating LLM‑driven workflows for triage, debugging, or automated code patching.
Understanding of geospatial data formats (e.g., LAS/LAZ, COPC, E57, GeoTIFF, Shapefiles) and tooling (e.g., GDAL, PDAL, untwine, laz‑perf).
Expertise designing and managing data schemas and storage systems for geospatial data (e.g., Postgres/PostGIS, AWS S3).
Experience with large‑scale data processing frameworks and cloud platforms (e.g., Spark, AWS Batch).
Familiarity with coordinate reference systems and transforms (CRS, WKT, pyproj, affine transforms).
Experience building data versioning, lineage, or artifact‑tracking systems.
Experience operating data pipelines that feed ML training and inference.
Familiar with C++.
#J-18808-Ljbffr
Compétences linguistiques
- English
Avis aux utilisateurs
Cette offre provient d’une plateforme partenaire de TieTalent. Cliquez sur « Postuler maintenant » pour soumettre votre candidature directement sur leur site.