About
Job Title: Site Reliability Engineer (AWS)
Qualifications, skills, and all relevant experience needed for this role can be found in the full description below.
Location: Dublin / Hybrid (2 days)
Type: Permanent
We’re looking for a Site Reliability Engineer (Mid-Level) who loves solving complex problems, automating everything possible, and keeping systems running smoothly. You’ll be right in the mix of building reliable, scalable, and high-performing cloud environments — mainly in AWS — while helping our dev teams ship great products faster and with fewer headaches.
This is a hands‑on role for someone who values automation, ownership, and collaboration over silos and manual fixes. You’ll also join our on‑call rotation (don’t worry, it’s shared and well‑supported).
What you’ll be doing- Build and manage robust, highly available AWS infrastructure using tools like Terraform or CloudFormation
- Maintain and improve CI/CD pipelines (Azure DevOps) for automated deployments and testing
- Work with Docker and Kubernetes (EKS/ECS) to orchestrate containerized workloads
- Automate as much as possible — from monitoring and alerting to deployment workflows
- Define and track reliability metrics (SLIs/SLOs/error budgets)
- Dive into incidents, lead root cause analysis, and make sure they don’t happen again
- Build out observability solutions (CloudWatch, Prometheus, Grafana, ELK, etc.)
- Partner with development and security teams to improve app reliability and platform performance
- Keep security tight — IAM roles, secrets management, and network boundaries are second nature
- Around 5–7 years in IT, with 3+ years in SRE, DevOps, or Cloud Engineering
- Deep hands‑on experience with AWS (EC2, VPCs, IAM, S3, RDS, CloudWatch, ALB/ELB, Route53)
- Solid experience building CI/CD pipelines with Azure DevOps
- Comfortable managing Linux and/or Windows environments at scale
- Strong background in Docker and Kubernetes — you know your way around clusters, scaling, and deployments
- Skilled with Infrastructure as Code (Terraform, xcfaprz CloudFormation)
- Confident scripting in Bash or Python for automating all the boring stuff
- Experienced in monitoring, logging, and alerting — you believe in metrics, not guesswork
- Understand the core of SRE: SLIs/SLOs, incident management, postmortems, capacity planning
- Always exploring ways to make cloud systems faster, more resilient, and more cost‑efficient
- Obsessed with uptime, reliability, and automation
- Open communicator who thrives in cross‑team collaboration
- Takes full ownership of what you build and run
- Always curious about the latest in SRE and cloud‑native tech
#J-18808-Ljbffr
Languages
- English
This job comes from a TieTalent partner platform. Click "Apply Now" to submit your application directly on their site.