Dieses Stellenangebot ist nicht mehr verfügbar
Data Engineer Scalable Pipelines NYC
Guac
- New York, New York, United States
- New York, New York, United States
Über
The grocery industry is enormous (it accounts for 4% of GDP) — and grocery food waste is a huge cost to grocers' bottom lines, but also to our planet.
Today, we're working with major supermarket chains in the US and Canada, and we've scaled to 7‑figures in ARR. We're backed by leading investors including YCombinator, 1984 Ventures, Collaborative Fund, and angels from Open AI, Instacart, and Citadel Securities.
We've brought together an exceptional team from Palantir, BCG, Oxford, Cambridge, and MIT to solve intellectually challenging problems and tackle food insecurity and waste with technology.
We're looking for talented data engineers in NYC to join our mission.
About the Role As a Data Engineer at Guac, you'll own the data infrastructure that powers our forecasts — the pipelines that ingest billions of rows of transaction, inventory, and operational data from grocers across the continent, and the systems that turn that data into accurate predictions multiple times a day.
You'll shape how we model new customers' data, build pipelines that scale across chains with hundreds of stores, and work on our ML systems to make them faster and more accurate. You'll occasionally work directly with customers' technical teams to understand their data and business logic — but the bulk of your time is on engineering.
Your responsibilities will include:
Data & Pipelines
Design and build ETL pipelines that process billions of rows of data multiple times per day across customers, using Python, Dagster, and Pub/Sub
Model new customer datasets and own the data layer for new deployments — from raw integration to forecast‑ready
Optimize our ML pipelines for demand forecasting — making them faster, cheaper, and more accurate at scale
Partner with customers' technical teams to understand their data systems and business logic, and translate that into our pipelines
Backend
Contribute to backend services (Python/FastAPI) that power our ordering and production planning products
Build internal tools and APIs that expose forecasts and data to our application layer
Expose our data and systems to LLMs via MCP servers, tool‑use APIs, and similar protocols
About You
3+ years of relevant data engineering experience
Strong proficiency in Python (Pandas, etc.) and SQL
Proven experience designing and implementing ETL systems across large distributed datasets, using orchestration tools like Dagster or Airflow
Comfortable operating with ambiguity and minimal process — you thrive when given a problem and trusted to figure out the solution
AI‑native: you use Claude Code, Cursor, or similar AI coding tools daily and ship significantly faster because of it
(Bonus) Experience optimizing ML pipelines or working closely with ML/forecasting systems
(Bonus) Experience with distributed computing frameworks like PySpark or Dask
What We Offer
First‑hand experience building an early‑stage startup with real ownership
Compensation: $150k–$250k base + competitive equity
Fully employer‑paid healthcare (medical, dental, and vision)
Unlimited vacation days
Fully covered food expenses in the office (lunch/dinner)
Free Equinox membership
Our Tech Stack
Languages & Frameworks: Python, FastAPI, SQL
Data & Pipelines: Dagster, Pub/Sub, BigQuery, Postgres, Dask, Pandas
Cloud & Infrastructure: GCP, Terraform, Docker
AI: MCP servers, Anthropic/OpenAI APIs, agentic tooling
Note: this is a 5x day a week in person role in NYC
#J-18808-Ljbffr
Sprachkenntnisse
- English
Hinweis für Nutzer
Dieses Stellenangebot wurde von einem unserer Partner veröffentlicht. Sie können das Originalangebot einsehen hier.