About
Linkedin Must
Hybrid - 3 days a week on site in the Redwood City, CA area ( Tuesday- Thursday) ****
Job Description: The Role
You’ll own the backend systems that make AI agents reliable in production: agent runtime services, integrations, data modeling, observability, and platform reliability. You’ll design, ship, measure, and harden systems that power real customer workflows at scale.
What You’ll Do
Own agent runtime services: tool execution, state management, orchestration, retries/idempotency, rate limiting.
Design APIs & contracts: stable, versioned internal/external APIs, webhooks/events, integration adapters.
Model complex domain data: schemas for agent memory/state, workflow history, audit trails, permissions, multi-tenant isolation.
Build integrations at scale: OAuth, webhooks, sync engines, connectors with robust observability and failure handling.
Reliability engineering: define SLIs/SLOs, implement tracing, timeouts, circuit breakers, budgets, and incident response.
Performance & cost controls: optimize latency/throughput, queues, caches, storage; manage inference/tool-call costs and runaway tasks.
Raise the bar: code quality, testing strategy, on-call hygiene, runbooks, postmortems, mentoring.
What We’re Looking For (Required)
Minimum years of experience: 7 Years experience
6+ years building backend systems for production SaaS, platforms, or distributed systems.
Strong fundamentals in distributed systems, concurrency, queues/workers, caching, and production ops.
Data modeling depth: relational design (Postgres/MySQL), migrations, indexing, query optimization, data correctness.
API design excellence: clear, evolvable contracts across internal services and external partners.
Thrive in high-velocity environments without compromising reliability/security.
Ownership mindset: build → ship → operate; comfortable with ambiguity and rapid iteration.
LLM product experience: prompting, tool calling, evals, latency/cost tradeoffs.
Agent architectures: planning/execution loops, memory/state, sandboxed tools, HITL, safety constraints.
Frameworks/SDKs: Vercel AI SDK, LangChain/LangGraph, Anthropic Agents, OpenAI tool calling, sandboxed runtimes.
Infra familiarity: Kubernetes, serverless, stream processing, feature stores, vector search.
Additional Information:
Minimum years of experience: 7 Years experience
Strong AI/LLM and backend developer experience
LLM product experience: prompting, tool calling, evals, latency/cost tradeoffs.
Agent architectures: planning/execution loops, memory/state, sandboxed tools, HITL, safety constraints.
Frameworks/SDKs: Vercel AI SDK, LangChain/LangGraph, Anthropic Agents, OpenAI tool calling, sandboxed runtimes.
Infra familiarity: Kubernetes, serverless, stream processing, feature stores, vector search.
Languages
- English
Notice for Users
This job comes from a TieTalent partner platform. Click "Apply Now" to submit your application directly on their site.