Über
Scroll down to find an indepth overview of this job, and what is expected of candidates Make an application by clicking on the Apply button.
Team Overview
As a Site Reliability Engineer II, you will help ensure Klaviyo’s critical platforms are reliable, scalable, and sustainable while enabling rapid product development.
We treat reliability as a core product feature and use software engineering to solve complex systems and operational challenges. Our work spans infrastructure, security, and software engineering, and focuses on building and operating systems that are reliable, secure, and performant at scale.
The SRE team’s charter is to build and operate foundational services and infrastructure, reduce operational toil through automation, and continuously improve systems based on real production learnings. Your work will directly impact how Klaviyo engineers build software and how customers experience our platform every day.
How you’ll make an impact
As a Site Reliability Engineer II, you will contribute to the reliability and operational excellence of Klaviyo’s platforms by working on well-scoped projects and owning services with support from senior engineers. You will:
- Build, operate, and improve production systems with a focus on reliability, scalability, and performance
- Apply software engineering principles to automate operational tasks and reduce manual toil
- Contribute to the design and implementation of systems using established SRE best practices
- Help define and measure SLIs and SLOs for services you support
- Improve observability through metrics, dashboards, logging, and tracing
- Participate in on-call rotations and respond to production incidents with guidance and support
- Assist with incident investigation and contribute to post-incident reviews and follow-up actions
- Perform basic analysis around system behavior, capacity usage, and scaling characteristics
- Identify reliability issues or operational pain points and work with teammates to address them
- Collaborate with product, platform, and security engineers to ship reliable systems
- Write and maintain clear operational runbooks and system documentation
Who you are
You are an early-to-mid career SRE who is comfortable operating production systems and eager to deepen your expertise in reliability engineering.
You:
- Have experience operating cloud-native production systems and services
- Write production-quality code (e.g. Python, Go, or similar) to automate operations and improve reliability
- Understand common failure modes in distributed systems, such as dependency failures, resource exhaustion, and partial outages
- Have experience working with containerized workloads and platforms (e.g. Kubernetes) in production environments
- Are comfortable participating in on-call rotations and diagnosing straightforward production issues
- Have experience observability tools and responding to alerts
- Are familiar with SRE concepts such as SLIs, SLOs, and error budgets, and are learning how to apply them in practice
- Have hands‑on experience with infrastructure as code or declarative configuration (e.g. Terraform, Kubernetes manifests)
- Can follow incident response processes and contribute meaningfully during outages
- Are comfortable receiving feedback, learning from incidents, and improving your systems over time
- You’ve already experimented with AI in work or personal projects, and you’re excited to dive in and learn fast. You’re hungry to responsibly explore new AI tools and workflows, finding ways to make your work smarter and more efficient.
Nice to have
- Experience supporting security-sensitive systems or internal platforms
- Familiarity with AWS or other cloud providers
- Exposure to messaging or asynchronous systems (e.g. Kafka, RabbitMQ, Celery)
- Interest in performance testing, capacity planning, or resilience work
- Practical experience with algorithms and data structures
Get to Know Klaviyo
Klaviyo’s platform is primarily built with Python and React and runs on AWS. Engineers join us from a wide range of technical backgrounds and are supported in learning our stack.
Core technologies include:
- MySQL / Redis / Memcached
Location & Work Model
This role is based in Dublin, Ireland and follows a hybrid working model. Klaviyo supports work authorization and relocation for this position.
At Klaviyo, we value people who take ownership, learn continuously, and collaborate openly. We are committed to building inclusive teams and encourage applications from candidates of all backgrounds.
AI fluency at Klaviyo includes responsible use of AI (including privacy, security, bias awareness, and human-in-the-loop). We provide accommodations as needed.
Klaviyo is committed to a policy of equal opportunity and non-discrimination. We do not discriminate on the basis of race, ethnicity, citizenship, national origin, color, religion or religious creed, age, sex (including pregnancy), gender identity, sexual orientation, physical or mental disability, veteran or active military status, marital status, criminal record, genetics, retaliation, sexual harassment or any other characteristic protected by applicable law.
IMPORTANT NOTICE: Our company takes the security and privacy of job applicants very seriously. We will never ask for payment, bank details, or personal financial information as part of the application process. All our legitimate job postings can be found on our official career site. Please be cautious of job offers that come from non-company email addresses (@ ), instant messaging platforms, or unsolicited calls.
By clicking "Submit Application" you consent to Klaviyo processing your Personal Data in accordance with our Job Applicant Privacy Notice. If you do not wish for Klaviyo to process your Personal Data, please do not submit an application. xcfaprz You can find our Job Applicant Privacy Notice here and here (FR).
#J-18808-Ljbffr
Sprachkenntnisse
- English
Dieses Stellenangebot stammt von einer Partnerplattform von TieTalent. Klick auf „Jetzt Bewerben”, um deine Bewerbung direkt auf deren Website einzureichen.