Site Reliability Engineer (SRE)

Charles Simon Associates Ltd

United Kingdom

United Kingdom

Ähnliche Jobs finden

Über

Site Reliability Engineer – (SRE, Site Reliability Engineer, Terraform, AKS, Azure, Kubernetes, PowerShell, Python, Bash, Datadog, Monitoring Tools) – Permanent – Remote Charles Simon Associates are currently recruiting for an SRE Engineer on a permanent basis. This role is for a global business with a HQ in the City of London. Candidates will need to be British Citizens due to Security Clearance requirements. Location: Remote, with some travel to London Salary: Up to £125,000 per annum Skills/Requirements for the Site Reliability Engineer: * Extensive SRE experience within previous roles * Strong Terraform skills * Proven Kubernetes and AKS experience * Experience in creating and modifying terraform deployment on live environments * Experience with Monitoring solutions ideally Datadog, however Azure Application Insight, Log Analytics or Grafana * Scripting skills for automation within; PowerShell, Python or Bash * Experience with web based applications Desirable Skills: * Knowledge or commercial experience of Microservices Architecture * Kanban * Any prior experience of working with Puppet and Chef would be advantageous Start date is ASAP for the Site Reliability Engineer The Site Reliability Engineer will be responsible for: * Designing and enforcing service-level objectives (SLOs), SLIs, and SLAs to ensure reliability targets are measurable and aligned with business expectations * Implementing incident response frameworks, including runbooks, postmortems, and blameless RCA processes to drive continuous improvement * Integrating observability tooling (e.g. Prometheus, Grafana, Datadog, OpenTelemetry) to enable proactive detection and resolution of system anomalies * Managing infrastructure as code (IaC) using tools like Terraform, Pulumi, or CloudFormation to ensure repeatable, auditable deployments * Optimizing cost and resource utilization across cloud environments through rightsizing, autoscaling, and lifecycle policies * Driving chaos engineering initiatives to test system resilience under failure conditions and validate recovery strategies * Championing security best practices within infrastructure—e.g. secrets management, IAM policies, and vulnerability scanning * Collaborating with DevOps and platform teams to build paved-road deployment patterns and internal developer portals * Leading capacity planning and load testing efforts to anticipate scaling needs and prevent bottlenecks * Contributing to architectural decisions that impact reliability, latency, and fault domains across distributed systems Please send an up-to-date copy of your CV to be considered for the Site Reliability Engineer Site Reliability Engineer – (SRE, Site Reliability Engineer, Terraform, AKS, Azure, Kubernetes, PowerShell, Python, Bash, Datadog, Monitoring Tools) – Permanent – Remote

United Kingdom

Sprachkenntnisse

English

Hinweis für Nutzer

Dieses Stellenangebot wurde von einem unserer Partner veröffentlicht. Sie können das Originalangebot einsehen hier.

Ähnliche Jobs finden