About
Duration:
12 Months (with possibility of extension based on performance and business needs.)
Location:
Remote: US ( Occasional travel to Yahoo offices Sunnywale, Ca or New York, NY)
Coding test:
Required
Pay:
$85-$100/Hr
How does this role fit within the team/department?
This role sits within Yahoo Mail's Production Engineering. Engineers in this role directly support cloud infrastructure reliability, cost efficiency, and automation for one of the world's largest consumer email platforms, serving hundreds of millions of users globally.
Overview Of The Team
Yahoo Mail Production Engineering manages GCP-based infrastructure including GKE clusters, Compute Engine, Dataproc, Vertex AI and more gcp services. The team is responsible for production reliability, capacity planning, cost optimization, CI/CD pipelines, MLOPS, and infrastructure-as-code across 40+ GCP projects on an extra large, petabyte data size scale. We work in close collaboration with software architects, developers and product managers to deliver end to end results.
Primary responsibilities (daily/weekly)?
Operate, monitor, and improve GKE apps, Analytics, and ML production workloads Manage Terraform/Ansible/Helm IaC for GCP resource provisioning and policy enforcement Participate in on-call rotation for production incidents Review and improve CI/CD pipelines for services deployed in Python, Node.js, and Java Collaborate with architects and developers on infrastructure architecture and design Automate cloud operations through programmable and secure solutions Leverage AI-driven tools for development agents, troubleshooting, and automation
Key projects or initiatives for the role?
On-prem to GCP migration of large-scale Yahoo Mail workloads Analyti- Analytics pipeline and reliability improvementsplatform work (Vertex AI, Generative AI, BigQuery, Looker, Dataproc)
Success metrics or KPIs for this role?
On-call incident resolution time and escalation rate (MTTD, MTTR, MTTE) Terraform/IaC coverage of managed resources CI/CD pipeline reliability and deployment velocity Progress on on-prem to GCP migration milestones Sprint goal achievement (SMART goals per sprint)
Technical (Required)
5+ years in SRE, DevOps, Infrastructure, or Cloud Operations with on-call duties GCP services proficiency: GKE, GCE, Networking, Security, CI/CD, and common cloud technologies IaC proficiency: Terraform, Ansible, and Helm Charts Programming in Python, Node.js, and Java; ability to build CI/CD pipelines in these languages Linux, TCP/IP, HTTP, mail protocols, DNS, CDN, load balancers, and troubleshooting Experience with large-scale production applications, systems, and networks
Technical (Advantageous)
Cloud databases and storage: GCS, Cloud SQL, Spanner, Memorystore ML/AI platforms: Vertex AI, Generative AI, BigQuery, Looker, Dataproc Cloud Observability and OpenTelemetry Proven track record migrating on-prem infrastructure to GCP Operational experience in both on-prem and cloud environments Ideal experience level (years, leadership, industries)?
5+ years total cloud/SRE experience, with preference for GCP. Experience at large-scale internet companies with petabytes level data production systems is strongly preferred.
Languages
- English
Notice for Users
This job comes from a TieTalent partner platform. Click "Apply Now" to submit your application directly on their site.