Cette offre d'emploi n'est plus disponible
À propos
Key Responsibilities - Design, develop, and maintain reliability solutions and SRE utilities to reduce toil, improve cloud platform reliability, and industrialize SRE practices. - Build and optimize Infrastructure as Code (IaC) using Terraform to manage AWS resources, incorporating cost-efficient design principles. - Develop CI/CD pipelines and automated testing to ensure code quality, reliability, and rapid delivery of solutions. - Define SRE standards, best practices, and guidelines for adoption across teams. - Establish SRE metrics like SLI, SLOs, etc. - Apply software engineering best practices including version control, code reviews, test-driven development, and documentation. - Participate in incident management and on-call rotation, providing technical support, troubleshooting production issues, and collaborating with teams to reduce incident recurrence. - Stay current with emerging AWS services, SRE methodologies, and cloud-native development technologies, and drive adoption of innovative solutions. - Collaborate within Agile and Scaled Agile frameworks with cross-functional teams to deliver integrated cloud automation solutions. - Produce clear, blameless postmortems with actionable items and documented failure scenarios.
Required Qualifications - Bachelor's degree in computer science, Information Systems, or equivalent background or equivalent experience. - 7+ years of extensive experience in software development with focus on reliability and platform engineering. - 5+ Years of advanced Python development skills with proven experience building enterprise-grade, highly available tools, APIs, and utilities. - 3+ years of hands-on experience developing solutions in AWS environments with deep understanding of core services (EC2, VPC, S3, Lambda, IAM, CloudFormation, EventBridge, Step Functions etc.) and resource cost optimization. - 3+ years of experience applying SRE principles including observability, toil automation, SLIs/SLOs and reliability engineering. - Expert-level proficiency with Infrastructure as Code (IaC) using Terraform, including module development and state management. - Strong experience with CI/CD pipelines, automated testing frameworks, and DevOps practices. - Experience with observability tools and practices including Grafana, AWS CloudWatch, AWS Canary. - Experience defining, implementing, and managing SLOs/SLIs and error budgets; familiarity with conducting RCAs and producing postmortem documentation. - Working experience in Agile and Scaled Agile environments and familiarity with ITSM processes (incident, change, and problem management), resilience testing and chaos engineering practices.
Preferred Qualifications - Experience with GoLang or additional programming languages.
Certifications - None specified.
Compétences linguistiques
- English
Avis aux utilisateurs
Cette offre a été publiée par l’un de nos partenaires. Vous pouvez consulter l’offre originale ici.