Dieses Stellenangebot ist nicht mehr verfügbar
Über
About The Company
Red Hat is a global leader in enterprise open-source software solutions, renowned for its innovative approach to delivering high-performing Linux, cloud, container, and Kubernetes technologies. Operating across more than 40 countries, Red Hat fosters a culture of collaboration, transparency, and inclusivity, empowering its employees to contribute their ideas and solve complex technical challenges. The company's community-powered model emphasizes open source principles, ensuring that the best ideas can originate from anyone, anywhere. Red Hat's commitment to diversity and inclusion creates an environment where all voices are valued and celebrated, driving continuous innovation and excellence in the technology industry.
About The Role
We are seeking a highly skilled Principal Machine Learning Engineer specializing in AI inference systems, with a focus on vLLM, to join our dynamic team. This remote/hybrid role based in Boston offers an exciting opportunity to be at the forefront of AI technology development. In this position, you will contribute to the design, development, and optimization of large language model (LLM) inference platforms, leveraging your expertise in high-performance computing, GPU architecture, and machine learning primitives. You will work closely with our open-source community and internal teams to address the most pressing challenges in model performance, scalability, and efficiency. Your work will directly impact the evolution of our cutting-edge AI deployment solutions, shaping the future of enterprise AI applications. As a leader in this space, you will mentor other engineers, foster innovation, and help maintain Red Hat's position as a pioneer in open-source AI technology.
Qualifications
- Extensive experience in writing high-performance code for GPUs using languages such as Python and C++
- Deep knowledge of GPU hardware architecture and performance optimization techniques
- Strong understanding of computer architecture, parallel processing, and distributed computing
- Experience with tensor math libraries such as PyTorch
- Proven expertise in GPU kernel optimization for deep neural networks
- Familiarity with high-performance networking protocols including UCX, RoCE, InfiniBand, and RDM
- Excellent communication skills for collaboration with technical and non-technical teams
- Bachelor's or Master's degree in Computer Science, Computer Engineering, or a related field; PhD is a plus
Responsibilities
- Develop and maintain high-performance inference systems, focusing on vLLM and related components
- Design, implement, and optimize inference algorithms and primitives to enhance model efficiency and scalability
- Conduct performance analysis, modeling, and benchmarking to identify and address bottlenecks
- Participate in technical design discussions, providing innovative solutions to complex problems
- Review code thoroughly, ensuring robustness, efficiency, and maintainability
- Mentor junior engineers, fostering a culture of continuous learning and innovation
- Collaborate with cross-functional teams to integrate AI models into enterprise solutions
- Stay up-to-date with the latest advancements in GPU computing, deep learning, and inference optimization
Benefits
- Comprehensive medical, dental, and vision coverage
- Flexible Spending
Sprachkenntnisse
- English
Dieses Stellenangebot wurde von einem unserer Partner veröffentlicht. Sie können das Originalangebot einsehen hier.