Data Engineer

Guac

United States

United States

Jetzt Bewerben

Über

At Guac, we're solving grocery food waste with AI. We forecast exactly how much of each product will sell, helping grocery retailers order and produce the perfect amount of inventory — and we're building AI-native tools that put those forecasts directly into the hands of store operators and buyers. The grocery industry is enormous (it accounts for 4% of GDP) — and grocery food waste is a huge cost to grocers' bottom lines, but also to our planet. Today, we're working with major supermarket chains in the US and Canada, and we've scaled to 7-figures in ARR. We're backed by leading investors including Y Combinator, 1984 Ventures, Collaborative Fund, and angels from Open AI, Instacart, and Citadel Securities. We've brought together an exceptional team from Palantir, BCG, Oxford, Cambridge, and MIT to solve intellectually challenging problems and tackle food insecurity and waste with technology. We're looking for talented data engineers in NYC to join our mission. About the Role
As a Data Engineer at Guac, you'll own the data infrastructure that powers our forecasts — the pipelines that ingest billions of rows of transaction, inventory, and operational data from grocers across the continent, and the systems that turn that data into accurate predictions multiple times a day. You'll shape how we model new customers' data, build pipelines that scale across chains with hundreds of stores, and work on our ML systems to make them faster and more accurate. You'll occasionally work directly with customers' technical teams to understand their data and business logic — but the bulk of your time is on engineering. Your responsibilities will include: Data & Pipelines Design and build ETL pipelines that process billions of rows of data multiple times per day across customers, using Python, Dagster, and Pub/Sub Model new customer datasets and own the data layer for new deployments — from raw integration to forecast-ready Optimize our ML pipelines for demand forecasting — making them faster, cheaper, and more accurate at scale Partner with customers' technical teams to understand their data systems and business logic, and translate that into our pipelines Backend Contribute to backend services (Python/FastAPI) that power our ordering and production planning products Build internal tools and APIs that expose forecasts and data to our application layer Expose our data and systems to LLMs via MCP servers, tool-use APIs, and similar protocols About You
3+ years of relevant data engineering experience Strong proficiency in Python (Pandas, etc.) and SQL Proven experience designing and implementing ETL systems across large distributed datasets, using orchestration tools like Dagster or Airflow Comfortable operating with ambiguity and minimal process — you thrive when given a problem and trusted to figure out the solution AI-native: you use Claude Code, Cursor, or similar AI coding tools daily and ship significantly faster because of it (Bonus) Experience optimizing ML pipelines or working closely with ML/forecasting systems (Bonus) Experience with distributed computing frameworks like PySpark or Dask What We Offer
First-hand experience building an early-stage startup with real ownership Compensation: $150k–$250k base + competitive equity Fully employer-paid healthcare (medical, dental, and vision) Unlimited vacation days Fully covered food expenses in the office (lunch/dinner) Free Equinox membership Our Tech Stack
Languages & Frameworks: Python, FastAPI, SQL Data & Pipelines: Dagster, Pub/Sub, BigQuery, Postgres, Dask, Pandas Cloud & Infrastructure: GCP, Terraform, Docker AI: MCP servers, Anthropic/OpenAI APIs, agentic tooling Note: this is a 5x day a week in person role in NYC

United States

Sprachkenntnisse

English

Hinweis für Nutzer

Dieses Stellenangebot stammt von einer Partnerplattform von TieTalent. Klick auf „Jetzt Bewerben”, um deine Bewerbung direkt auf deren Website einzureichen.

Jetzt Bewerben