Back to Jobs
XX
Senior Machine Learning Engineer, Services/MLOpsAdobeUnited States

This job offer is no longer available

XX

Senior Machine Learning Engineer, Services/MLOps

Adobe
  • US
    United States
  • US
    United States

About

Senior Machine Learning Engineer
Firefly Foundry is Adobe's enterprise managed-service offering for custom multimedia generative AI — deep-tuned image, video, and 3D models built on each customer's IP, paired with creative production workflows and a media-intelligence layer, and deployed across new and existing Adobe surfaces. The business has gained significant traction in Media & Entertainment, marketing, and consumer retail, and is expanding rapidly into adjacent verticals. We are hiring a Senior Machine Learning Engineer to build the pipelines and services that turn Firefly Foundry's models into reliable, enterprise-grade products. You will compose heterogeneous model pipelines including finetuned LLMs, image and video generation models, 3D mesh reconstruction, upsamplers, NSFW and safety checkers, and IP guardrail models — deploy them as services, scale those services to enterprise traffic, and hold them to SLAs for availability and latency, all while ensuring served quality matches the training and reference environment. Across this work you will integrate and operate multiple, distinct generative model architectures, in a mix that evolves quickly. This is a high-ownership role in a fast-moving environment, with direct, measurable impact on the availability, latency, cost, and quality of everything Firefly Foundry ships. Depending on your focus area, you may own externalizable data pipelines for self-serve fine-tuning, optimized VLM deployments for media intelligence and querying, or the platform that lets the team deploy new pipelines rapidly with full observability. What You Will Do
Own the full serving lifecycle for heterogeneous model pipelines — packaging, versioned rollout, canary/rollback, and autoscaling — from research checkpoint to enterprise endpoint. Deploy these pipelines as services, scale them to enterprise traffic, and hold them to SLAs for availability, latency, and throughput. Ensure served quality matches the training and reference environment — closing train/serve gaps across precision, preprocessing, and model versions. Engineer for enterprise from the ground up: tenancy boundaries, data isolation, and the controls that let us honor customer IP contracts under audit. Build the platform underneath it all — rapid pipeline deployment, observability, monitoring, and alerting. Define and enforce quality gates in the deployment pipeline – automated eval, regression detection, and drift monitoring that block bad model versions from reaching production. Own GPU capacity and cost – utilization, batching efficiency, and right-sizing acceleration fleets against latency SLAs. Run production ML operationally – on-call, incident response, and postmortems for availability and latency regressions Depending on your focus area, you may also: Build externalizable data pipelines that power self-serve fine-tuning flows for enterprise customers. Stand up optimized VLM deployments for media intelligence and content querying. Who You Will Partner With
Applied Science — to take research models into reliable, high-throughput serving and to keep served quality faithful to the training environment. ML Engineering leadership and AI Platform — on shared infrastructure, accelerator capacity, and serving primitives at platform scale. Firefly Foundry Studio — to translate creative production workflows into performant, dependable ML services. What You Bring
5+ years in machine learning engineering, with significant ownership of production ML or inference services at scale. Strong Python and deep-learning engineering skills (PyTorch), with hands-on experience deploying and scaling model-backed services. Experience composing multi-model pipelines and serving them behind APIs — orchestration, batching, autoscaling, and version management. A track record owning production SLAs — availability, latency, and throughput — backed by real observability, monitoring, and alerting. Comfort working across multiple, distinct generative model architectures (LLMs and VLMs, diffusion and transformer models, 3D/mesh) — enough to integrate, optimize, and reason about output quality, in partnership with Applied Science. Experience with multi-tenant systems and data isolation in an enterprise or regulated context. Fluency with containers and orchestration (Docker, Kubernetes), CI/CD for ML, and a major cloud (AWS or Azure). GPU inference optimization for latency and cost — quantization, batching, and serving runtimes; custom CUDA a plus. Strong, data-driven problem-solving and excellent communication in cross-functional teams. Education
Master's or PhD in Computer Science, Computer Engineering, or a related field — or equivalent practical experience building and operating production ML systems.
  • United States

Languages

  • English
Notice for Users

This job was posted by one of our partners. You can view the original job source here.