Dieses Stellenangebot ist nicht mehr verfügbar
Senior SRE / DevOps Engineer (Atlanta)
Broad Reach Partners
- United States
- United States
Über
We are seeking a
Site Reliability Engineer
to join our team in Atlanta and play a key role in enhancing the stability, performance, and reliability of our production systems. You’ll work closely with development, DevOps, and security teams to improve observability, optimize system performance, and ensure production readiness. From monitoring to automation, you’ll make a direct impact on our cloud infrastructure and service reliability.
In this role, you will work hand‑in‑hand with our development, operations, and security teams worldwide to implement best practices, automate deployments, and ensure our platforms are reliable, secure, and scalable. Troubleshooting in Kubernetes is required, which will involve a deep understanding of pods, nodes, networking, scaling, logs, and service‑to‑service communication.
This role requires a deep understanding of SRE best practices and a strong ability to troubleshoot complex issues.
Responsibilities
Maintain and enhance monitoring tools (New Relic, Graylog) for service health and performance metrics.
Implement and maintain high‑availability systems with capacity planning, performance optimization, and fault tolerance.
Define and monitor Service Level Indicators, Objectives, and Agreements with teams.
Deploy and manage Kubernetes workloads to AWS EKS using Helm, ArgoCD.
Automate operational processes to reduce manual interventions.
Manage Kubernetes workloads on AWS EKS for secure and stable deployments.
Participate in on‑call rotation, troubleshoot production issues, and implement permanent fixes.
Work with DevOps to improve CI/CD pipelines and with development teams to embed resilience and observability.
Document operational runbooks, escalation procedures, and production playbooks.
Required Skills and Experience
8+ years of experience as a Site Reliability Engineer, or equivalent.
Experience with tools like New Relic for monitoring and Graylog for logging.
3+ years of experience with Amazon Web Services (AWS) or Microsoft Azure.
3+ years of experience with Kubernetes clusters – performance monitoring in Kubernetes.
Proficiency with public cloud environments (AWS preferred).
Proficiency in a scripting language, like Bash, Groovy, Python.
Excellent debugging and troubleshooting skills.
Ability to prioritize tasks efficiently and independently under minimal supervision.
Nice to Have
Familiarity with .NET applications.
Knowledge in Terraform, Ansible, monitoring tools.
This is a full‑time role and unfortunately we can’t sponsor, so you must be a US citizen or a green‑card holder. You must currently live in the Atlanta area as you will need to come into our Atlanta office one or two times each month for key meetings with our team.
If you thrive on solving complex technical challenges, have a passion for automation, and want to influence how enterprise platforms evolve and modernize, this is an ideal opportunity for you.
Seniority Level Mid‑Senior level
Employment Type Full‑time
Job Function Information Technology
Industries Staffing and Recruiting
Referrals increase your chances of interviewing at Broad Reach Partners by 2x
Benefits
Medical insurance
Vision insurance
401(k)
Contact Get notified about new Site Reliability Engineer jobs in
Alpharetta, GA .
#J-18808-Ljbffr
Sprachkenntnisse
- English
Hinweis für Nutzer
Dieses Stellenangebot wurde von einem unserer Partner veröffentlicht. Sie können das Originalangebot einsehen hier.