This job offer is no longer available
About
You will:
Analyze and report on ML resource (compute, storage, accelerators) usage, costs, and efficiency trends across different teams and projects.
Develop, maintain, and improve dashboards and tools for monitoring key resource metrics and providing actionable insights.
Identify and investigate opportunities for resource optimization, cost reduction, and performance improvements in ML workflows.
Support and guide ML engineers and researchers on best practices for resource utilization
Contribute to the development, documentation, and enforcement of resource management policies and best practices.
You have:
Bachelor's degree in Computer Science, Engineering, or related field, and 2+ years equivalent experience
Experience with distributed systems principles and experience building distributed systems for production environments.
Solid Python or C++ skills
Experience monitoring, debugging, and troubleshooting complex distributed systems
Experience communicating updates and resolutions to customers and other partners
We prefer:
Experience with compute and storage management for medium to large organizations
The expected base salary range for this full-time position across US locations is listed below. Actual starting pay will be based on job-related factors, including exact work location, experience, relevant training and education, and skill level. Your recruiter can share more about the specific salary range for the role location or, if the role can be performed remote, the specific salary range for your preferred location, during the hiring process. Waymo employees are also eligible to participate in Waymo’s discretionary annual bonus program, equity incentive plan, and generous Company benefits program, subject to eligibility requirements. Salary Range $170,000—$216,000 USD
Languages
- English
Notice for Users
This job was posted by one of our partners. You can view the original job source here.