Backend Engineer (Monitoring)

Apollo Research

London, England, United Kingdom

London, England, United Kingdom

Find similar jobs

About

Application deadline: We accept submissions until 16 January 2026. We review applications on a rolling basis and encourage early submissions.
Opportunity Join our new AGI safety monitoring team and help transform complex AI research into practical tools that reduce risks from AI. As a Backend Engineer, you'll work closely with our CEO, monitoring engineers and Evals team software engineers to build tools that make AI agent safety accessible at scale. We are building tools that monitor AI coding agents for safety and security failures.
You will join a small team and will have significant ability to shape the team & tech, and have the ability to earn responsibility quickly. This opportunity is for those who care about building tools that genuinely make AI agents safe and thrive in high‑paced environments as well as enjoy closely working with researchers.
Key Responsibilities
Infrastructure & Architecture
– Design and implement scalable backend systems capable of processing and analyzing large volumes of AI agent logs in real‑time; build and maintain data processing pipelines that extract, transform, and store agent trajectory data efficiently; architect database schemas and data models optimized for high‑throughput writes and complex analytical queries; design for reliability, implement robust error handling, retry logic, and graceful degradation; monitor system performance and optimize bottlenecks to ensure sub‑second latency for critical monitoring operations.
API Development
– Develop secure, well‑documented RESTful APIs that allow users to integrate our monitoring tools into their workflows; implement authentication, authorization, and rate limiting; build webhook systems and real‑time notification services to alert users about critical safety events; design API interfaces that are intuitive for developers while remaining flexible for diverse use cases; integrate with SIEM systems to stream monitoring alerts and security events into existing security operations workflows.
Data Systems
– Implement efficient storage solutions for structured and unstructured data; build processing systems for real‑time monitoring and batch analysis of historical data; design caching strategies to optimize frequent queries; create data retention and archival policies that balance user needs with storage efficiency.
Monitoring & Observability
– Build comprehensive logging, metrics, and tracing systems; implement alerting systems; create dashboards and tools to help the team understand system behavior; design systems that make debugging production issues straightforward and minimize time‑to‑resolution.
Collaboration & Quality
– Work closely with researchers to understand needs and translate prototypes into production‑ready systems; collaborate with frontend engineers for excellent user experiences; participate in code reviews to maintain high standards; document architectural decisions, API specifications, and system behaviors; contribute to technical discussions about technology choices, trade‑offs, and implementation approaches.
Job Requirements
4+ years of experience building production backend systems at scale.
Strong Python proficiency with experience in frameworks such as FastAPI, Flask, or Django.
Experience designing and implementing RESTful APIs with clear documentation.
Solid understanding of database design and optimization (SQL and/or NoSQL).
Experience with cloud platforms (AWS, Google Cloud, or Azure) and containerization technologies (Docker, Kubernetes).
Experience building data‑intensive applications or processing large‑scale log data.
Strong understanding of system design principles, including scalability, reliability, and security.
Experience with asynchronous processing, message queues, and distributed systems.
Demonstrated ability to write clean, well‑tested, maintainable code.
Bonus
Familiarity with real‑time data processing frameworks (Kafka, Redis Streams, etc.).
Experience with ML/AI infrastructure or building tools for AI applications.
Previous work on developer tools, monitoring systems, or security tools.
Experience with infrastructure‑as‑code (Terraform, CloudFormation, etc.).
Familiarity with AI safety concepts or evaluation frameworks like Inspect.
Contributions to open‑source backend infrastructure projects.
Experience building security‑centric tools.
Experience with code analysis platforms.
Experience with Golang.
Representative Project
Real‑time agent monitoring infrastructure
– Design and build the backend system that processes AI coding agent outputs in real‑time to detect safety and security issues. Build a scalable ingestion pipeline that accepts agent logs via API, route logs through monitors, implement storage layers, and add a notification system that alerts users to concerning behaviors. Ensure sub‑second p95 latency for critical operations while handling spikes and partial failures.
Benefits
Salary: £100k – £180k GBP (≈ $135k – $245k USD).
Flexible work hours and schedule.
Unlimited vacation and sick leave.
Lunch, dinner, and snacks provided on workdays.
Paid work trips, including staff retreats and conferences.
A yearly $1,000 professional development budget.
Logistics
Start Date: Target 2–3 months after the first interview.
Time Allocation: Full‑time.
Location: London; in‑person role in the LISA office. Remote arrangements considered on a case‑by‑case basis.
Work Visas: We can sponsor UK visas.
About the Team The monitoring team is new. You will work closely with CEO Marius Hobbhahn, engineer Jeremy Neiman, and others on the monitoring team. You’ll also collaborate with SWEs Rusheb Shah, Andrei Matveiakin, Alex Kedrik, and Glen Rodgers to translate internal tools into publicly usable tools. Interaction with researchers is essential, as we intend to use our tools internally for research.
Equality Statement Apollo Research is an Equal Opportunity Employer. We value diversity and are committed to providing equal opportunities to all, regardless of age, disability, gender reassignment, marriage and civil partnership, pregnancy and maternity, race, religion or belief, sex, or sexual orientation.
How to Apply Please complete the application form with your CV. The provision of a cover letter is neither required nor encouraged. Feel free to share links to relevant work samples.
Interview Process Our multi‑stage process includes a screening interview, a take‑home test (≈3 hours), three technical interviews, and a final interview with CEO Marius. Technical interviews align closely with tasks you would perform on the job. No leetcode‑style general coding interviews.
Privacy and Fairness We are committed to protecting your data, ensuring fairness, and adhering to workplace fairness principles. AI‑powered tools assist with resume screening but all resumes are screened by a human and final hiring decisions are made by our team. For questions about data processing or fairness concerns, contact privacy@apolloresearch.com.
#J-18808-Ljbffr

London, England, United Kingdom

Languages

English

Notice for Users

This job was posted by one of our partners. You can view the original job source here.

Find similar jobs