XX
Site Reliability LeadCBL SolutionsAtlanta, Georgia, United States

This job offer is no longer available

XX

Site Reliability Lead

CBL Solutions
  • US
    Atlanta, Georgia, United States
  • US
    Atlanta, Georgia, United States

About

Job Title: SRE Lead

Location: Atlanta GA

Hybrid: Thursday to Wed work from office – Alternate weeks

Contract Position

Key Skills:

  • Chaos Testing
  • Resiliency solutions
  • Node JS
  • observability strategy
  • AWS Serverless

Client Consulting:

  • Work with team to define
    SRE maturity model, observability strategy, identify gaps and AWS reliability roadmap
    .
  • Translate business SLAs into
    SLIs/SLOs/Error Budgets
    .

Architecture & Design:

  • Lead and implement
    AWS serverless reliability architecture
    (multi-region, failover, self-healing,).
  • Define
    observability blueprints
    (logs, metrics, traces, UX telemetry).
  • Define cost optimized Data Observability and Resiliency solutions

Reliability & Resilience

  • Design and implement fault-tolerant, highly available AWS architectures.
  • Experience in DynamoDB global tables , RDS Failovers, capacity planning
  • Apply
    SRE principles
    : SLIs, SLOs, SLAs, error budgets, and toil reduction.
  • Drive
    chaos engineering, disaster recovery, and capacity planning
    exercises.

Observability & Monitoring

  • Experience in implementing
    end-to-end observability
    (logs, metrics, traces, events).
  • Build cost optimized unified dashboards, custom metrics using
    Dynatrace, Cloudwatch
  • Experience in implementing Data Observability and Resiliency solutions
  • Automate alerts, anomaly detection, and incident response workflows.

Automation & Infrastructure

  • Develop
    automation and custom tooling
    using
    Python and
    .
  • Build
    infrastructure as code
    using
    AWS CDK and CloudFormation
    .
  • Implement
    self-healing and auto-remediation
    solutions with AWS serverless Services

Operations & Incident Management

  • Implement AI/ML-driven automation.
  • Collaborate with developers for
    shift-left observability and performance optimization
    .
  • Guide and Lead
    adoption of automation, proactive observability, and self-healing systems
    .
  • Atlanta, Georgia, United States

Languages

  • English
Notice for Users

This job was posted by one of our partners. You can view the original job source here.