Machine Learning Engineer, AI

Red Hat

United States

United States

Ähnliche Jobs finden

Über

About The Company

Red Hat is a global leader in enterprise open-source software solutions, renowned for its innovative approach to delivering high-performing Linux, cloud, container, and Kubernetes technologies. Operating across more than 40 countries, Red Hat fosters a culture of collaboration, transparency, and inclusivity, empowering its employees to contribute their ideas and solve complex technical challenges. The company's community-powered model emphasizes open source principles, ensuring that the best ideas can originate from anyone, anywhere. Red Hat's commitment to diversity and inclusion creates an environment where all voices are valued and celebrated, driving continuous innovation and excellence in the technology industry.

About The Role

We are seeking a highly skilled Principal Machine Learning Engineer specializing in AI inference systems, with a focus on vLLM, to join our dynamic team. This remote/hybrid role based in Boston offers an exciting opportunity to be at the forefront of AI technology development. In this position, you will contribute to the design, development, and optimization of large language model (LLM) inference platforms, leveraging your expertise in high-performance computing, GPU architecture, and machine learning primitives. You will work closely with our open-source community and internal teams to address the most pressing challenges in model performance, scalability, and efficiency. Your work will directly impact the evolution of our cutting-edge AI deployment solutions, shaping the future of enterprise AI applications. As a leader in this space, you will mentor other engineers, foster innovation, and help maintain Red Hat's position as a pioneer in open-source AI technology.

Qualifications

Extensive experience in writing high-performance code for GPUs using languages such as Python and C++
Deep knowledge of GPU hardware architecture and performance optimization techniques
Strong understanding of computer architecture, parallel processing, and distributed computing
Experience with tensor math libraries such as PyTorch
Proven expertise in GPU kernel optimization for deep neural networks
Familiarity with high-performance networking protocols including UCX, RoCE, InfiniBand, and RDM
Excellent communication skills for collaboration with technical and non-technical teams
Bachelor's or Master's degree in Computer Science, Computer Engineering, or a related field; PhD is a plus

Responsibilities

Develop and maintain high-performance inference systems, focusing on vLLM and related components
Design, implement, and optimize inference algorithms and primitives to enhance model efficiency and scalability
Conduct performance analysis, modeling, and benchmarking to identify and address bottlenecks
Participate in technical design discussions, providing innovative solutions to complex problems
Review code thoroughly, ensuring robustness, efficiency, and maintainability
Mentor junior engineers, fostering a culture of continuous learning and innovation
Collaborate with cross-functional teams to integrate AI models into enterprise solutions
Stay up-to-date with the latest advancements in GPU computing, deep learning, and inference optimization

Benefits

Comprehensive medical, dental, and vision coverage
Flexible Spending

United States

Sprachkenntnisse

English

Hinweis für Nutzer

Dieses Stellenangebot wurde von einem unserer Partner veröffentlicht. Sie können das Originalangebot einsehen hier.

Ähnliche Jobs finden