Cloud Platform Site Reliability Engineer
- Sunnyvale, California, United States
- Sunnyvale, California, United States
À propos
Mission of the Cloud Intelligence Group SRE Team
The mission of the Cloud Intelligence Group SRE (Site Reliability Engineering) Team is to ensure the stability of production environments, enterprise-grade cloud data reliability, and service continuity for the Cloud Intelligence Group. Our greatest challenge lies in guaranteeing uninterrupted business operations for cloud-based customers and achieving availability that exceeds 99.99%.
Objectives of the Cloud Intelligence Group SRE Team
Our goal is to establish a systematic stability assurance framework that integrates technology and management, including but not limited to:
- Developing stability standards and metrics
* Covering robust architecture, R&D quality, release management, production environment operations, and more.
* Embedding stability into Alibaba Cloud's technical R&D system.
- Driving major stability governance campaigns
* Initiatives such as full-stack disaster recovery, phased change rollout, the emergency response mechanism (1-minute alerting, 5-minute triage, 10-minute recovery), and financial-loss prevention.
* Rapidly and continuously mitigating stability risks.
- Building a stability-focused technical platform
* Platform capabilities for unattended change management, red/blue team drills, emergency collaboration, risk and vulnerability inspection, and monitoring/alerting.
* Simplifying stability engineering through automation and tooling.
- Executing production incident management
* Emergency response, cross-team coordination, root cause analysis, rapid recovery, and post-incident reviews to drive systemic improvements.
- Ensuring stability for large-scale customer events
* Technical and operational support for critical activities such as Olympics and customer business peak periods.
- On-call responsibilities
* Responding to customer issues within Service Level Agreement (SLA) timeframes, resolving problems proactively, and enhancing customer experience.
The objective of the Cloud Intelligence Group's SRE team is to establish a systematic stability assurance framework that integrates technology and management, including but not limited to:
Daily operations and maintenance of applications, databases, and middleware, as well as troubleshooting and answering customer inquiries;
Collaborating with R&D to develop critical support plans based on customer business requirements during peak periods, including preparation during the standby period, on-duty support during critical periods, and post-standby review;
Cloud Intelligence Group undertakes Alibaba Group's core technologies and business innovations in the high-tech sector, dedicated to building an enterprise-level cloud computing service platform for the digital economy era. It provides leading technology solutions and services globally, characterized by massive business scale and complex enterprise-level cloud computing services.
The mission of the Cloud Intelligence Group's SRE team is to ensure the stability of the production environment and the reliability of enterprise-level cloud computing data and service continuity. How to guarantee the uninterrupted operation of cloud-based customers' businesses and achieve availability exceeding 99.99% is a significant challenge we face.
Responsibilities
The objective of the Cloud Intelligence Group's SRE team is to establish a systematic stability assurance framework that integrates technology and management, including but not limited to:
Daily operations and maintenance of applications, databases, and middleware, as well as troubleshooting and answering customer inquiries;
Collaborating with R&D to develop critical support plans based on customer business requirements during peak periods, including preparation during the standby period, on-duty support during critical periods, and post-standby review;
The pay range for this position at commencement of employment is expected to be between $104,400 and $171,000/year. However, base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience.
If hired, employee will be in an "at-will position" and the Company reserves the right to modify base salary (as well as any other discretionary
Compétences linguistiques
- English
Cette offre provient d’une plateforme partenaire de TieTalent. Cliquez sur « Postuler maintenant » pour soumettre votre candidature directement sur leur site.