Devops Engineer

Cosine

Harrow on the Hill, England, United Kingdom

Harrow on the Hill, England, United Kingdom

Ähnliche Jobs finden

Über

Overview Job title:
Devops Engineer
Location:
London; full in-office working as default
Start date:
ASAP
Reports to:
CTO
Compensation:
£60 - 90k + Equity
Cosine at a glance: At Cosine, we’re building autonomous AI engineers that plan, write, and ship code inside real development workflows. Cosine is designed for on-premise and virtual private cloud (VPC) deployments, including fully air-gapped environments. We build our agent tooling entirely in-house and post-train open-source models to deliver reliable, enterprise-grade coding performance in security-critical settings. In 2024, Cosine achieved a 72% score on OpenAI’s SWE-Lancer benchmark, placing us among the strongest real-world software-engineering AI systems evaluated. YC-backed and well-funded, Cosine was founded by experienced operators focused on building dependable, production-grade AI.
This role is based in our Hoxton office, five days a week, because close collaboration, fast feedback, and shared context matter for the problems we’re solving.
The role We’re looking for a Devops / Senior Platform / Infra Engineer to own the core infrastructure that powers Cosine’s products — from Kubernetes and deployment pipelines to networking and platform services.
You’ll design and run the “paved road” that our engineers, researchers, and customers build on: reliable Kubernetes clusters, fast and safe CI/CD, solid observability, and hardened environments for demanding enterprise and on-prem deployments. You’ll also wear a classic “DevOps/SRE” hat: thinking in SLOs, running incident response, and keeping us up even as we move quickly.
This is a high-ownership role at a fast-paced, venture-backed Silicon Valley startup. You’ll work directly with founding engineers and leadership, and your decisions will materially shape how we build and ship products.
What You’ll Do
Own core infrastructure
Design, operate, and evolve our Kubernetes-based platform (EKS or similar), including cluster topology, node groups, autoscaling, and multi-environment isolation.
Manage supporting cloud resources: container registries, load balancers, queues, caches, and data infra needed to run our APIs and agents.
Build the deployment & tooling layer
Design and maintain CI/CD pipelines for image builds and infra rollouts (e.g. Pulumi/Terraform + Helm/Docker).
Implement safe rollout strategies (blue/green, canary, staged rollouts) and fast rollback paths.
Build internal tools and abstractions that make it easy for product teams to self-serve infra safely.
Own reliability & operations (SRE-ish)
Define and track SLOs/SLIs for key services (latency, error rates, availability).
Improve our observability stack (metrics, logs, traces, alerts) so issues are obvious, actionable, and debuggable.
Participate in the on-call rotation, lead incident response when needed, and drive blameless post-mortems and fixes.
Shape networking & security
Design and maintain networking: VPCs, subnets, ingress/egress, service meshes / L7 routing, DNS, and TLS.
Implement least-privilege access via IAM, secure secret management, and hardened configurations for multi-tenant and isolated customer environments.
Help design patterns for secure enterprise and on-prem / regulated deployments.
Partner with product & research
Work closely with application, ML, and research teams to understand their needs and translate them into reusable infra building blocks.
Provide guidance on “how to run this in production” — capacity planning, failure modes, and operational readiness reviews.
What We’re Looking For
Have strong experience
5+ years building and operating production infrastructure on a major cloud (AWS, GCP, or Azure).
Significant hands-on experience running Kubernetes in production (EKS/GKE/AKS or self-managed):
Cluster upgrades, autoscaling, node group design, and multi-env setups.
Helm or similar for packaging services.
Think in infrastructure-as-code
Deep experience with IaC tools (Pulumi, Terraform, CDK, or similar).
Comfortable managing infra changes via code review, CI, and automated rollouts.
Care deeply about reliability
Have owned the uptime and performance of user-facing systems.
Comfortable participating in (and improving) on-call rotations and incident management.
Experience setting up / tuning observability (Prometheus, Grafana, CloudWatch, OpenTelemetry, etc.).
Build great tooling & abstractions
You’ve built internal tools, libraries, or platforms on top of cloud providers so product teams can move faster with fewer foot-guns.
You think about developer experience and “golden paths,” not just raw infra.
Are comfortable in code
Strong scripting and programming skills in at least one modern language (e.g. TypeScript, Go, Python).
Happy to dive into app code when needed to debug a production issue or improve an integration.
Have the startup mindset
Enjoy working in a fast-moving environment with evolving priorities and incomplete specs.
Bias toward pragmatic solutions: ship something small, measure, iterate.
Communicate clearly, give/receive direct feedback, and collaborate across functions.
Nice To Have (Not Required)
Experience with:
AWS primitives like EKS, ECS/Fargate, ECR, SQS, ElastiCache/Redis.
Argo CD or other GitOps tools for Kubernetes.
On-prem, air-gapped, or regulated industry deployments (e.g. finance, healthcare).
AI/ML infrastructure (GPU workloads, model hosting, feature stores).
Prior experience as an early infra / platform hire at a startup.
Cosine is an equal opportunity employer We value diverse backgrounds, perspectives, and ways of thinking, and we’re committed to creating an inclusive and respectful workplace. We encourage applications from anyone who meets the role requirements, even if you don’t meet every single qualification. If you need reasonable adjustments at any stage of the hiring process, we’re happy to discuss them.
Compensation, Benefits & Ways Of Working We’re an in-office team, five days a week, by design. We believe the work we’re doing benefits from being together, collaborating closely, and building shared context.
What You Can Expect
Competitive salary, benchmarked to the market
Equity / share options, so you share in the upside you help create
30 days’ holiday + bank holidays
Genuine 9–5 working hours — we don’t expect late nights or weekend work
Work hard in the office, collaborate closely, and switch off properly
Dog-friendly office — bring your dog to work
Daily lunch provided
Monthly team breakfasts
Monthly socials
Pension
High-quality equipment to do your best work
We care about focus, sustainability, and doing great work — not performative overwork. We value people who show up, contribute thoughtfully, collaborate well with their colleagues, and then go home.
This role won’t suit everyone. But if you want structure, clarity, strong collaboration, and a team that takes both the work and work-life balance seriously, it’s a great place to be.
Agency & Data Protection Notice To comply with UK GDPR and our internal data-protection and equal-opportunity obligations, we only accept candidate applications and agency submissions via our Applicant Tracking System (ATS). This ensures appropriate privacy notices, lawful processing, auditability, and consistent retention controls. Any CVs or candidate details received outside the ATS (including via email, Slack, or direct message) will be treated as unsolicited, will not be considered as part of the recruitment process, and will not give rise to any fee or payment obligation.
#J-18808-Ljbffr

Harrow on the Hill, England, United Kingdom

Sprachkenntnisse

English

Hinweis für Nutzer

Dieses Stellenangebot wurde von einem unserer Partner veröffentlicht. Sie können das Originalangebot einsehen hier.

Ähnliche Jobs finden