Senior / Lead Machine Learning Engineer, Serving - Germany
Inworld
- United States
- United States
Über
Experience We Find Useful
Inference Optimization.
Deep understanding of modern serving frameworks and techniques like vLLM or TRT‑LLM.
Model Acceleration.
Hands‑on experience with quantization, distillation, caching strategies, continuous batching, paged attention, and speculative decoding.
High‑Performance Systems.
Proficiency in C++, CUDA, Rust, or highly optimised Python. You know how to profile code and squeeze every ounce of performance out of NVIDIA GPUs.
Distributed Systems & Scaling.
Experience with Kubernetes, Ray, custom load balancing, multi‑GPU/multi‑node inference, and reliably handling thousands of concurrent connections.
Public work.
Non‑trivial systems programming projects, open‑source contributions to major inference engines, or deep‑dive technical write‑ups.
Full‑cycle ownership.
You can take a model from the research team, containerise it, optimise its serving, and ensure it runs reliably in production.
Background.
PhD in CS, Physics, Math, or equivalent practical experience building backend or ML systems.
Professional fluency in English.
Written and spoken is required, as you will be collaborating daily with our US‑based leadership and engineering teams.
Who Thrives Here
You don’t need a roadmap to start walking; you’re comfortable picking a direction and building the map as you go.
You believe engineering isn’t finished until it’s shipped and stable. You have a bias for impact over purely theoretical optimisations.
You don’t just ship code; you obsess over the why. You’re the first to question an architecture if you think there’s a better way to solve the core latency or throughput problem.
You aren’t satisfied with "the PM said so." You thrive on deep context and want to understand the fundamental logic behind every decision we make.
What Working Here Is Like We hand you unclear problems and expect you to make them clear. We value engineers who say "I don’t know yet" and then design the benchmark or prototype that finds out. We treat performance, latency, and reliability as first‑class product features, not a box to check before launch. Impact comes before everything else, though we support sharing work and open‑source contributions that move the field forward. Your work should be visible. Flat structure, fast iterations, minimal process theatre.
Relocation Support For candidates interested in relocating to the San‑Francisco Bay Area in the future, full US visa and relocation support may be available, subject to business needs and applicable legal and work authorisation requirements.
#J-18808-Ljbffr
Sprachkenntnisse
- English
Hinweis für Nutzer
Dieses Stellenangebot stammt von einer Partnerplattform von TieTalent. Klick auf „Jetzt Bewerben”, um deine Bewerbung direkt auf deren Website einzureichen.