Machine Learning Engineer
Encore Technologies
- San Jose, Arizona, United States
- San Jose, Arizona, United States
About
Job Description We are seeking Senior, Lead, Staff, and Principal Machine Learning Engineers to join our dynamic team. In this role, you will play a key part in deploying, optimizing, and scaling AI and machine learning models in production environments. You will bridge the gap between research and engineering by transforming cutting‑edge ML innovations into reliable, high-performance, and cost‑efficient services that power enterprise AI applications.
You will collaborate with AI researchers, MLOps engineers, platform teams, software engineers, data engineers, and product leaders to deliver scalable AI solutions while improving training infrastructure, inference performance, and operational reliability across the full machine learning lifecycle.
Key Responsibilities
Productize machine learning models developed by research teams into scalable, reliable, and highly available production services with clearly defined service level objectives (SLOs) for latency, availability, and cost.
Design, optimize, and scale distributed model training across multi-node and multi-GPU environments using modern distributed training techniques.
Improve model efficiency through optimization techniques such as quantization, pruning, distillation, KV‑cache optimization, Flash Attention, and inference acceleration.
Build, deploy, and maintain high-performance model serving infrastructure using modern inference frameworks and serving platforms.
Design scalable inference pipelines supporting batching, streaming, autoscaling, caching, load balancing, and memory optimization.
Integrate production AI systems with vector databases, feature stores, data lakes, and enterprise data pipelines.
Define, monitor, and continuously improve performance, reliability, utilization, and cost metrics for production AI systems.
Partner closely with MLOps teams to improve CI/CD pipelines, telemetry, observability, model registries, deployment automation, and production monitoring.
Collaborate with Machine Learning Scientists to ensure reproducible training, model evaluation, and seamless production handoffs.
Develop clean, maintainable, well-tested, and performant production code while contributing to engineering best practices.
Participate in architecture reviews, technical design discussions, and mentoring of engineering team members (Lead, Staff, and Principal levels).
Required Qualifications
Bachelor’s degree in Computer Science, Electrical Engineering, Computer Engineering, or a related technical field.
Master’s degree preferred; equivalent industry experience will be considered.
Senior Level: 3–5 years of experience developing, deploying, and supporting production machine learning systems.
Lead, Staff, and Principal Levels: Progressive experience leading large-scale ML infrastructure initiatives, optimizing production AI platforms, mentoring engineers, and driving technical strategy.
Demonstrated experience deploying high-throughput, low-latency machine learning services in production environments.
Strong proficiency in:
Python
PyTorch (primary)
TensorFlow
Experience with distributed training techniques including:
Distributed Data Parallel (DDP)
Fully Sharded Data Parallel (FSDP)
ZeRO optimization
Pipeline parallelism
Tensor parallelism
Experience optimizing model performance using:
Quantization (PTQ, QAT, AWQ, GPTQ)
Model pruning
Knowledge distillation
KV‑cache optimization
Flash Attention
Experience with scalable model serving technologies such as:
vLLM
NVIDIA Triton Inference Server
Hugging Face Text Generation Inference (TGI)
ONNX Runtime
TensorRT
AITemplate
Strong understanding of:
Full machine learning lifecycle
Model deployment
Inference optimization
Performance profiling
Production monitoring
Capacity planning
Experience writing performant, maintainable, production-quality software.
Excellent analytical, problem-solving, communication, and collaboration skills.
Preferred Qualifications
Experience with vector databases and retrieval systems including:
FAISS
Milvus
Pinecone
pgvector
Experience with SQL, NoSQL databases, Parquet, Delta Lake, and object storage technologies.
Experience with MLOps platforms, model registries, CI/CD pipelines, Infrastructure as Code (IaC), and Kubernetes.
Familiarity with cloud platforms such as AWS, Azure, or Google Cloud Platform.
Experience building and operating Large Language Model (LLM) infrastructure and generative AI applications.
Experience collaborating across Research, Platform Engineering, Infrastructure, Data Engineering, and Product organizations.
Experience mentoring engineers, establishing engineering standards, and influencing technical direction across multiple teams.
Familiarity with distributed systems architecture, GPU optimization, and high-performance computing environments.
Work Environment & Location
Location: Remote, Hybrid, or Onsite (based on client requirements)
Travel: Minimal, as required
Collaborative team environment with opportunities to build and scale next-generation AI platforms, influence engineering strategy, and work on cutting‑edge machine learning technologies supporting enterprise-scale AI solutions.
Equal Opportunity Employer Encore Talent Solutions is an Equal Opportunity Employer. We respect and seek to empower each individual and support the diverse cultures, perspectives, skills, and experiences within our workforce.
#J-18808-Ljbffr
Languages
- English
Notice for Users
This job comes from a TieTalent partner platform. Click "Apply Now" to submit your application directly on their site.