Über
Nuance Labs is building the next generation of emotionally expressive, real-time AI.
This is a critical role to build the infrastructure that powers our AI platform. You will own the systems that serve models at scale, orchestrate complex data workflows, and ensure our real-time video AI runs reliably with low latency for users worldwide.
Responsibilities
Own Inference Infrastructure:
Build and maintain the serving stack for multimodal AI workloads. Optimize for latency, throughput, and cost using batching strategies, autoscaling, and intelligent resource allocation. Real-Time Video Streaming:
Architect systems to handle long-lived WebRTC connections with unpredictable client behavior, ensuring smooth video and audio delivery at scale. Orchestrate Data Workflows:
Build robust pipelines for offline processing, evaluation, and training using orchestration frameworks like Dagster or Ray. Manage petabyte-scale video storage and network requirements. GPU Cluster Management:
Configure and maintain GPU clusters using Kubernetes and Terraform. Implement monitoring, autoscaling based on custom metrics, and cost optimization strategies. Developer Tooling:
Build CI/CD, evaluation, and versioning systems that enable safe, zero-downtime model deployments and rapid iteration cycles. Requirements
Infrastructure Expertise:
Strong practical experience with Kubernetes, Terraform, and cloud platforms. You can design secure, scalable systems and debug complex distributed issues. Systems Programming:
Proficiency in Python and experience with systems languages (Rust or Go). Comfortable profiling workloads and resolving compute, memory, or network bottlenecks. Orchestration & Pipelines:
Experience managing large-scale offline workflows using tools like Dagster, Ray, Airflow, or similar frameworks. Production Operations:
Deep understanding of production reliability, monitoring, incident response, and capacity planning for high-traffic services.
Preferred Experience
Experience with
WebRTC or real-time media pipelines
in production Experience running
GPU-backed inference services
at scale (vLLM, Triton Inference Server, TensorRT) Knowledge of
performance optimization
and low-level systems debugging Familiarity with
video/audio processing
and storage systems
Nuance Labs Key Facts
$10M seed round backed by Accel, South Park Commons, Lightspeed, and top angels including Synthesia's former CPO. A world-class team of PhDs from MIT, UW, and Oxford with decades of industry experience at Apple and Meta, advancing real-time avatars from cutting-edge research to products used by millions. In-person collaboration, 5 days a week at Seattle HQ
Sprachkenntnisse
- English
Hinweis für Nutzer
Dieses Stellenangebot stammt von einer Partnerplattform von TieTalent. Klick auf „Jetzt Bewerben”, um deine Bewerbung direkt auf deren Website einzureichen.