À propos
12 Months (with possibility of extension based on performance and business needs.)
Location:
Remote: US (Occasional travel to Yahoo offices Sunnyvale, CA or New York, NY)
Coding test:
Required
Pay:
$85-$100/Hr
Role Fit This role sits within Yahoo Mail's Production Engineering. Engineers in this role directly support cloud infrastructure reliability, cost efficiency, and automation for one of the world's largest consumer email platforms, serving hundreds of millions of users globally.
Overview Of The Team Yahoo Mail Production Engineering manages GCP-based infrastructure including GKE clusters, Compute Engine, Dataproc, Vertex AI and more GCP services. The team is responsible for production reliability, capacity planning, cost optimization, CI/CD pipelines, MLOPS, and infrastructure-as-code across 40+ GCP projects on an extra large, petabyte data size scale. We work in close collaboration with software architects, developers and product managers to deliver end to end results.
Primary responsibilities (daily/weekly)
Operate, monitor, and improve GKE apps, Analytics, and ML production workloads
Manage Terraform/Ansible/Helm IaC for GCP resource provisioning and policy enforcement
Participate in on-call rotation for production incidents
Review and improve CI/CD pipelines for services deployed in Python, Node.js, and Java
Collaborate with architects and developers on infrastructure architecture and design
Automate cloud operations through programmable and secure solutions
Leverage AI-driven tools for development agents, troubleshooting, and automation
Key projects or initiatives for the role
On-prem to GCP migration of large-scale Yahoo Mail workloads
Analyti- Analytics pipeline and reliability improvements platform work (Vertex AI, Generative AI, BigQuery, Looker, Dataproc)
Success metrics or KPIs for this role
On-call incident resolution time and escalation rate (MTTD, MTTR, MTTE)
Terraform/IaC coverage of managed resources
CI/CD pipeline reliability and deployment velocity
Progress on on-prem to GCP migration milestones
Sprint goal achievement (SMART goals per sprint)
Technical (Required)
5+ years in SRE, DevOps, Infrastructure, or Cloud Operations with on-call duties
GCP services proficiency: GKE, GCE, Networking, Security, CI/CD, and common cloud technologies
IaC proficiency: Terraform, Ansible, and Helm Charts
Programming in Python, Node.js, and Java; ability to build CI/CD pipelines in these languages
Linux, TCP/IP, HTTP, mail protocols, DNS, CDN, load balancers, and troubleshooting
Experience with large-scale production applications, systems, and networks
Technical (Advantageous)
Cloud databases and storage: GCS, Cloud SQL, Spanner, Memorystore
ML/AI platforms: Vertex AI, Generative AI, BigQuery, Looker, Dataproc
Cloud Observability and OpenTelemetry
Proven track record migrating on-prem infrastructure to GCP
Operational experience in both on-prem and cloud environments
Ideal experience level (years, leadership, industries) 5+ years total cloud/SRE experience, with preference for GCP. Experience at large-scale internet companies with petabytes level data production systems is strongly preferred.
#J-18808-Ljbffr
Compétences linguistiques
- English
Avis aux utilisateurs
Cette offre provient d’une plateforme partenaire de TieTalent. Cliquez sur « Postuler maintenant » pour soumettre votre candidature directement sur leur site.