À propos
Analyze and optimize model training performance, improving distributed training workflows and maximizing resource utilization across various hardware environments.
Enhance system observability, debuggability, and operational excellence to uplift user experience.
Collaborate with cross-functional teams to integrate cutting-edge features and technologies into the platform.
Essential Skills & Qualifications: Bachelor's degree or higher in Computer Science or a related field, or equivalent experience.
3+ years of hands-on software engineering experience.
2+ years of specialized experience in AI/ML infrastructure, including facilitating distributed training for large ML models.
Proficient in Python, with strong expertise in frameworks such as PyTorch (preferred), TensorFlow, and others.
Familiar with distributed computing, GPU computing, and cloud platforms (AWS, GCP, Azure).
Willingness to travel to Sunnyvale, CA as needed.
Comfortable navigating ambiguous and dynamic work environments.
Preferred Qualifications: 5+ years of professional software engineering experience.
Self-motivated, with a strong focus on driving results and creating meaningful impact.
Deep knowledge and experience with PyTorch 2.x+ and distributed training frameworks.
Experience in designing training frameworks that support FSDP, Pipeline Parallelism, and other advanced techniques for training large foundational models.
Skilled in profiling, analyzing, debugging, and optimizing training and data loading performance.
Excellent communication skills to facilitate discussions, achieve consensus, manage risks, and provide constructive feedback.
Compensation: The compensation range for this position is between $170,000 and $240,000, based on relevant experience and expertise. The role also comes with incentive pay based on individual and company performance. Relocation: This position may be eligible for relocation benefits. Benefits: We offer an extensive range of health and wellness programs, including medical, dental, vision plans, Health Savings Accounts, retirement savings plans, and more.
This role is primarily remote, but individuals residing within a 50-mile radius of Mountain View, Sunnyvale, Detroit, Warren, or Milford are expected to report to one of these locations at least three times a week, as determined by management. About GM: At General Motors, our mission is to achieve Zero Crashes, Zero Emissions, and Zero Congestion, and we are committed to leading the transformation necessary to create a safer, more equitable world.
Compétences linguistiques
- English
Avis aux utilisateurs
Cette offre provient d’une plateforme partenaire de TieTalent. Cliquez sur « Postuler maintenant » pour soumettre votre candidature directement sur leur site.