Software Engineering Inference Engineer

Virtue AI

United States

United States

Postuler Maintenant

À propos

Inference Engineer
Virtue AI sets the standard for advanced AI security platforms. Built on decades of foundational and award-winning research in AI security, its AI-native architecture unifies automated red-teaming, real-time multimodal guardrails, and systematic governance for enterprise apps and agents. Deploy in minutesacross any environmentto keep your AI protected and compliant. We are a well-funded, early-stage startup founded by industry veterans, and we're looking for passionate builders to join our core team. What You'll Do
As an Inference Engineer, you will own how models are served in production. Your job is to make inferences fast, stable, observable, and cost-efficienteven under unpredictable workloads. You will: Serve and optimize LLM, embedding, and other ML models' inference across multiple model families Design and operate inference APIs with clear contracts, versioning, and backward compatibility Build routing and load-balancing logic for inference traffic Package inference services into production-ready Docker images Implement logging and metrics for inference systems Analyze server uptime and failure modes Design GPU and model placement strategies Work closely with backend, platform (Cloud, DevOps), and ML teams to align inference behavior with product requirements What Makes You a Great Fit
You understand that inference is a systems problem, not just a model problem. You think in QPS, p99 latency, GPU memory, and failure domains. Required Qualifications
Bachelor's degree or higher in CS, CE, or related field Strong experience serving LLMs and embedding models in production Hands-on experience designing Inference APIs and Load balancing and routing logic Experience with SGLang, vLLM, TensorRT, or similar inference frameworks Strong understanding of GPU behavior Experience with Docker, Prometheus metrics, and structured logging Ability to debug and fix real inference failures in production Experience with autoscaling inference services Familiarity with Kubernetes GPU scheduling Experience supporting production systems with real SLAs Proven ability to debug and fix inference failures in production Comfortable operating in a fast-paced startup environment with high ownership Preferred Qualifications
Experience with GPU-level optimization Memory planning and reuse, Kernel launch efficiency, Reducing fragmentation and allocator overhead Experience with kernel- or runtime-level optimization CUDA kernels, Triton kernels, or custom ops Experience with model-level inference optimization Quantization (FP8 / INT8 / BF16), KV-cache optimization, Speculative decoding or batching strategies Experience pushing inference efficiency boundaries (latency, throughput, or cost) Why Join Virtue AI
Competitive salary + equity Direct ownership of inference reliability and performance Hard problems at the intersection of systems, GPUs, and AI Production impact Your work directly affects latency, cost, and uptime Strong technical culture Engineers who debug and optimize, not just prototype

United States

Compétences linguistiques

English

Avis aux utilisateurs

Cette offre provient d’une plateforme partenaire de TieTalent. Cliquez sur « Postuler maintenant » pour soumettre votre candidature directement sur leur site.

Postuler Maintenant