This job offer is no longer available
DevOps Engineer (Founding Team)
Fabrion
- San Francisco, California, United States
- San Francisco, California, United States
About
Build and maintain scalable cloud infrastructure across AWS/GCP/Azure with a focus on secure, tenant-isolated deployments Own and evolve CI/CD systems (e.g. GitHub Actions, ArgoCD) with progressive rollout, testing, and rollback flows Establish observability tooling across services, agents, and pipelines (OpenTelemetry, Prometheus, Grafana, Sentry) Implement policy-as-code (OPA, Rego) for deployment safety, RBAC, audit logging, and approval workflows Define and enforce SLAs, uptime targets (99.99%+), incident response, and remediation workflows Secure infrastructure: IAM, VPC, encryption, key management, image scanning, secrets rotation Automate deployments, infrastructure provisioning (Terraform, Helm), and environment replication
What We’re Looking For Core Experience:
4–10+ years in DevOps, platform engineering, or SRE in production-grade systems Strong experience with Docker, Kubernetes (EKS/GKE), Terraform or Pulumi Hands-on experience deploying and monitoring distributed cloud-native systems Familiar with GitOps practices, CI/CD design, progressive delivery, and secure SDLC Clear understanding of how to implement monitoring, alerting, and failure simulation in dynamic environments
Engineering Mindset:
Obsessed with reliability, latency, uptime, and repeatability Security-aware and compliance-conscious Proactive — you don’t wait for alerts to fix things Comfortable collaborating with backend, AI, and data teams
Bonus: Agent-Native / ML Ops Capabilities
We’re building an agentic, AI-native platform from the ground up. Experience here isn’t required, but would be a strong differentiator: Experience running LLM orchestration frameworks (e.g. LangChain, LangGraph, Dust, ReAct agents) Building retrieval-augmented generation (RAG) pipelines — and deploying them safely and repeatably Familiarity with vector DBs (Weaviate, Qdrant, Pinecone) and embedding pipelines Monitoring and governing long-running or multi-agent chains Auditability and replay systems for agent decision-making Serving fine-tuned or open-source LLMs with model versioning and GPU scaling (e.g. vLLM, TGI) Interest in auto-remediation using agents (e.g. observability + alert → insight → response via LLM)
Why This Role Matters DevOps is the nervous system of the platform — every agent, every data fabric component, every pipeline flows through what you build. This is a rare opportunity to design that system early, the right way, and future-proof it for scale, compliance, and trust. If you're excited by intelligent systems, distributed data, and deeply technical infrastructure problems — and you want your work to have immediate real-world impact — we’d love to hear from you. #J-18808-Ljbffr
Languages
- English
Notice for Users
This job was posted by one of our partners. You can view the original job source here.