Data Engineer

Humoniq (YC S25)

Mission Viejo, California, United States

Mission Viejo, California, United States

Ähnliche Jobs finden

Über

Who We Are
We are a YC-backed startup with $8M+ raised, led by repeat founders who've built and scaled successful companies before. Our mission is ambitious: we're building deeply integrated AI systems that understand, reason, and act to solve real-world problems in travel and transport. We're not another "move fast and burn out" shop. We believe peak productivity comes when humans have psychological safety, time to sleep, move, eat well, and be understood. That's the culture we're building. We don't believe in overwork or equating hours with outcomes. What matters is results tied to business and customer outcomes—nothing else.

What makes us different
:

We don't worship grind culture. We believe peak output comes when people are well-rested, strong, loved, safe, and understood.

Sleep > All-nighters

Excercise & health > Burnout & "hustle"

Psychological safety > Fear & politics

Because humans at their best → happy, motivated, and productive

Location
: Mission Viejo, CA (Los Angeles Outskirts)

As our
Data Engineer
, you'll build the pipelines and tools that let us:

Ingest and analyze thousands of AI-driven support conversations
Run regression tests on new prompts and models before they hit production
Detect drift in user behavior and model outputs before customers feel it

You'll sit at the intersection of data engineering, ML evaluation, and backend infra. You won't be tuning models all day — you'll be building the systems that make tuning
safe and fast
.

You'll work closely with:

Min (AI lead) on evaluation design and metrics
Victor (Technical product/backend Lead) on schemas, APIs, and internal tools
Farzad (COO) on priorities and impact

If you forget everything else, remember this:
"If I make it easy for the team to see, measure, and trust that the AI is taking the highest quality actions at scale, and actively improving the AI when needed, I'm winning."

What you'll do

Your first 6–12 months, you'll:

Build a log ingestion pipeline
Ingest GCP Cloud Run / application logs into a central store (BigQuery / Postgres)
Parse logs into ticket-level and message-level records
Join in evaluator comments and metadata so we can analyze behavior end-to-end
Ship an AI regression and evaluations
Re-run historical conversations through new prompts / models
Compare End-of-Conversation classification/Issue/Task action-plan outputs over time
Generate clear reports that show regressions, hallucinations, and wins
Improve our AI agents through prompting and other changes.
Implement drift detection
Track distributions of intents, outcomes, and actions over time
Detect when user behavior or model outputs deviate from baseline
Surface drift in dashboards and alerts so we can act before customers are hurt
Build internal dashboards & tools
Let evaluators and product see problem tickets quickly
Make it trivial to search for "all conversations where X went wrong"
Visualize trends so we stop arguing anecdotes and start arguing data
Own reliability + documentation
Add monitoring and alerting around your pipelines
Document your data models, assumptions, and runbooks
Make it possible for someone new to pick up your work and move forward

You might be a fit if…

You've owned a data / infra pipeline in production before, not just written a script.
You're comfortable in Python and have used it for ETL, log parsing, or analytics.
You've worked with cloud infra (GCP preferred; AWS/Azure okay if you can translate).
You've used data warehouse platforms like BigQuery / Snowflake / Postgres with non-trivial schemas.
You think in terms of metrics and failure modes:
"What happens if the schema changes?"
"How will we know if this silently stops working?"
"What's the rollback if this regression job reveals something bad?"

You don't need to be an ML research person. We care more that you can:

Take messy logs and turn them into structured, usable data
Design evaluation flows that are repeatable and automatable
Make it obvious when things are getting better or worse

Must-haves

Explicit and demonstrable experience in backend, data engineering, or ML infra (or equivalent real-world work)
Strong Python skills for scripting and small services
Experience with at least one cloud platform (GCP ideal)
Experience building and operating ETL / data pipelines in production
Comfort with SQL and analytical databases (BigQuery, Snowflake, Redshift, or similar)
Clear written communication and willingness to document decisions

Nice-to-haves

Experience with:
GCP Cloud Run / Cloud Logging / Pub/Sub / Cloud Scheduler
BigQuery specifically
Data orchestration tools (Airflow, Dagster, Prefect, dbt, etc.)
Experience with observability stacks (Grafana, Prometheus, OpenTelemetry, etc.)
Familiarity with LLMs, prompt evaluation, or ML monitoring

How we work

Small team, high ownership — you won't be a cog.
We care about results, not hours.
We give direct feedback, quickly.
We expect you to push back with reasons, not vibes.

Mission Viejo, California, United States

Sprachkenntnisse

English

Hinweis für Nutzer

Dieses Stellenangebot wurde von einem unserer Partner veröffentlicht. Sie können das Originalangebot einsehen hier.

Ähnliche Jobs finden