XX
Cloud Platform Site Reliability EngineerAlibaba CloudSunnyvale, California, United States
XX

Cloud Platform Site Reliability Engineer

Alibaba Cloud
  • US
    Sunnyvale, California, United States
  • US
    Sunnyvale, California, United States
Postuler Maintenant

À propos

Mission of the Cloud Intelligence Group SRE Team

The mission of the Cloud Intelligence Group SRE (Site Reliability Engineering) Team is to ensure the stability of production environments, enterprise-grade cloud data reliability, and service continuity for the Cloud Intelligence Group. Our greatest challenge lies in guaranteeing uninterrupted business operations for cloud-based customers and achieving availability that exceeds 99.99%.

Objectives of the Cloud Intelligence Group SRE Team

Our goal is to establish a systematic stability assurance framework that integrates technology and management, including but not limited to:

  1. Developing stability standards and metrics

* Covering robust architecture, R&D quality, release management, production environment operations, and more.

* Embedding stability into Alibaba Cloud's technical R&D system.

  1. Driving major stability governance campaigns

* Initiatives such as full-stack disaster recovery, phased change rollout, the emergency response mechanism (1-minute alerting, 5-minute triage, 10-minute recovery), and financial-loss prevention.

* Rapidly and continuously mitigating stability risks.

  1. Building a stability-focused technical platform

* Platform capabilities for unattended change management, red/blue team drills, emergency collaboration, risk and vulnerability inspection, and monitoring/alerting.

* Simplifying stability engineering through automation and tooling.

  1. Executing production incident management

* Emergency response, cross-team coordination, root cause analysis, rapid recovery, and post-incident reviews to drive systemic improvements.

  1. Ensuring stability for large-scale customer events

* Technical and operational support for critical activities such as Olympics and customer business peak periods.

  1. On-call responsibilities

* Responding to customer issues within Service Level Agreement (SLA) timeframes, resolving problems proactively, and enhancing customer experience.

The objective of the Cloud Intelligence Group's SRE team is to establish a systematic stability assurance framework that integrates technology and management, including but not limited to:

  1. Daily operations and maintenance of applications, databases, and middleware, as well as troubleshooting and answering customer inquiries;

  2. Collaborating with R&D to develop critical support plans based on customer business requirements during peak periods, including preparation during the standby period, on-duty support during critical periods, and post-standby review;

Cloud Intelligence Group undertakes Alibaba Group's core technologies and business innovations in the high-tech sector, dedicated to building an enterprise-level cloud computing service platform for the digital economy era. It provides leading technology solutions and services globally, characterized by massive business scale and complex enterprise-level cloud computing services.

The mission of the Cloud Intelligence Group's SRE team is to ensure the stability of the production environment and the reliability of enterprise-level cloud computing data and service continuity. How to guarantee the uninterrupted operation of cloud-based customers' businesses and achieve availability exceeding 99.99% is a significant challenge we face.

Responsibilities

The objective of the Cloud Intelligence Group's SRE team is to establish a systematic stability assurance framework that integrates technology and management, including but not limited to:

  1. Daily operations and maintenance of applications, databases, and middleware, as well as troubleshooting and answering customer inquiries;

  2. Collaborating with R&D to develop critical support plans based on customer business requirements during peak periods, including preparation during the standby period, on-duty support during critical periods, and post-standby review;

The pay range for this position at commencement of employment is expected to be between $104,400 and $171,000/year. However, base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience.

If hired, employee will be in an "at-will position" and the Company reserves the right to modify base salary (as well as any other discretionary

  • Sunnyvale, California, United States

Compétences linguistiques

  • English
Avis aux utilisateurs

Cette offre provient d’une plateforme partenaire de TieTalent. Cliquez sur « Postuler maintenant » pour soumettre votre candidature directement sur leur site.