À propos
You will report to Sr. Staff Product Operations Engineer, Cloud operations.
Technology You'll Use AWS, Azure, GCP
Your Role Responsibilities? Here's What You'll Do
Architect & Strategize: Lead the design of our next-generation deployment architecture for a microservices-based platform. Drive technological choices for team tooling and infrastructure, ensuring long-term scalability and reliability.
AIOPS: Implement AIOps frameworks to improve operational tasks and enhance system self-healing capabilities.
Develop CI/CD Pipelines: Design, manage, and increase our CI/CD pipelines using tools like Jenkins, Git, and GitHub to allow rapid, reliable, and automated software delivery.
Ensure Uptime: Take ultimate ownership of our production environment's stability. Lead end-to-end incident management, from escalation to Root Cause Analysis (RCA). Manage patching, upgrades, and disaster recovery processes. You will include participation in a 24x7 on-call rotation to support critical uptime.
Automate & Operate: Engineer and own a world-class observability stack (e.g.,Prometheus, Grafana, CloudWatch, ELK). Develop automation scripts and frameworks to streamline operational tasks and enhance system self-healing capabilities.
Mentor & Lead: Act as a technical leader and mentor for the team. Share your expertise, establish best practices, and improve the technical capabilities of the entire team.
You are with a deep passion for solving complex infrastructure and scalability challenges in a distributed systems environment.
You have, demonstrated by experience the uptime and reliability of critical production systems.
You are an adept cross-cultural collaborator, while in a distributed, multicultural team environment (France/India).
You are a disciplined who is responsible, in a remote or hybrid work model.
What We'd Like to See
AIOps & Experienced Automation: experience using observability data for AIOps programmes. Familiarity with applying statistical analysis or machine learning models for predictive monitoring, anomaly detection, and automated root cause analysis.
Infrastructure as Code (IaC): Mastery of tools like Terraform or CloudFormation. Experience with configuration management tools like Ansible, Chef, or Puppet.
Scripting & Automation: Expert-level proficiency in at least one scripting language (Python, Bash, MongoDB Queries) with a portfolio of successful automation projects.
CI/CD: Deep experience building CI/CD pipelines and deployment tools (Jenkins, Git, GitHub).Observability: Hands-on experience building monitoring/logging for distributed systems (Prometheus, Grafana, CloudWatch).
Containerization: understanding and practical experience with Docker and Kubernetes (or other orchestrators).Networking & OS: understanding of Unix/Linux fundamentals and advanced TCP/IP networking concepts (DNS, Load Balancers, Firewalls, VPC/VNet).
Role Essentials
Bachelor of Science (BSc) degree in Engineering, Computer Science, or a related technical field.
8+ years of progressive experience in DevOps, SRE, or Cloud Platform Engineering, with at least 3 years in a senior role managing large-scale production environments.
Deep, hands-on expertise in at least one major public cloud (AWS, Azure, or GCP) and production experience with at least one other. Experience with OCI cloud.
Experience supporting microservices-based architectures in a production environment.
Compétences linguistiques
- English
Avis aux utilisateurs
Cette offre provient d’une plateforme partenaire de TieTalent. Cliquez sur « Postuler maintenant » pour soumettre votre candidature directement sur leur site.