Dieses Stellenangebot ist nicht mehr verfügbar
Über
Read all the information about this opportunity carefully, then use the application button below to send your CV and application.
What you will do:
Build high‑quality, high‑performing AI/ML applications and agent systems using modern inference platforms for multi‑modal and distributed model serving
Apply and optimize inference techniques including KV cache management, model quantization, and distributed serving to production workloads
Contribute to upstream inference runtime communities such as vLLM, TGI, PyTorch, OpenVINO, and related projects
Build multi‑modal AI applications integrating vision, language, and other modalities
Provide technical leadership and coordination across multiple stakeholders and engineering teams
Apply a growth mindset by staying current with rapid advancements in AI/ML inference technologies
Benchmark and analyze inference performance at scale, driving data‑driven optimization decisions
Publicize innovations through blogs, presentations, conferences, and other technical venues
What you will bring:
Bachelor's degree in Computer Science, Engineering, or equivalent experience
5+ years of experience in AI/ML engineering with focus on production inference systems
Deep expertise in PyTorch and modern deep learning frameworks
Hands‑on experience with inference runtime optimization (model serving, batching, KV cache management)
Advanced programming skills in Python and C++ Proven ability to contribute to and lead open source projects
Strong self‑motivation and organizational skills
Ability to work concurrently on multiple projects, independently and within a team environment
Excellent English written and verbal communication skills
Collaborative attitude and willingness to share ideas openly
The following are considered a plus:
Experience with vLLM, TGI (Text Generation Inference), or similar inference runtimes
Contributions to PyTorch, OpenVINO, or other inference frameworks
Experience with distributed model serving xcfaprz and GPU optimization
Familiarity with Kubernetes and cloud‑native AI/ML deployments
Knowledge of model quantization techniques (GPTQ, AWQ, FP8, etc.)
Experience with CUDA, Triton, or other GPU programming frameworks
Experience with diffusion models and diffusion transformers
Experience building AI agents and agentic systems
Equal Opportunity Policy (EEO)
Red Hat is proud to be an equal opportunity workplace and an affirmative action employer. We review applications for employment without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, citizenship, age, veteran status, genetic information, physical or mental disability, medical condition, marital status, or any other basis prohibited by law.
#J-18808-Ljbffr
Sprachkenntnisse
- English
Hinweis für Nutzer
Dieses Stellenangebot wurde von einem unserer Partner veröffentlicht. Sie können das Originalangebot einsehen hier.