Machine Learning Engineer

Sailplane

California, Maryland, United States

California, Maryland, United States

Postuler Maintenant

À propos

About the Company

In an unmarked building somewhere in Silicon Valley, a small team of engineers is working on what could become one of the most transformative technologies in enterprise computing: autonomous infrastructure that manages itself. Sailplane, backed by AI kingmaker Khosla Ventures (OpenAI's first investor) and seed specialist True Ventures, is building a "self-driving cloud" - intelligent agents capable of autonomously managing the largest and most advanced AI infrastructure on the planet. Sailplane is solving one of the most complex challenges in modern computing: autonomous management of massive AI data centers. We are creating intelligent agents that operate rack-scale systems worth millions of dollars. Think Waymo for cloud infrastructure. "We're building million-dollar agents," explains co-founder Sam Ramji, who previously led product at Google Cloud Platform and brought Linux to Microsoft. "These aren't consumer-grade chatbots - they're sophisticated autonomous systems managing rack-scale hardware worth millions per unit."

About the Role

Sailplane is an early-stage AI infrastructure startup. Expect to wear many hats (building ML platforms, MLOps tools, data/LLM infrastructure). You will bring a startup mindset, eager to take ownership of projects, navigate ambiguity, and move quickly to solve challenging problems in a fast-paced environment. As a ML Engineer, you'll lead the build and operations of LLMs in production on-premise for Sailplane. This is a senior individual contributor role focused on hands-on coding, systems thinking, and prototyping. You won't manage a team, but you will mentor and amplify those around you. You should be fluent in models, adept at integrating production infrastructure and observability, and lead performance benchmarking. You're comfortable working in code and in diverse production environments, and you care deeply about correctness and quality. This hybrid position reports to the CEO and is expected to work from our downtown San Francisco office 2 to 3 days per week.

Responsibilities

Build, deploy, monitor, and operate LLMs in production on-premises in diverse customer environments
Implement MLOps best practices (CI/CD pipelines, containerization, continuous monitoring) to ensure reliable performance
Benchmark performance and recommend solutions to improve customer deployments including hardware sizing for target throughput (tokens per second, concurrent user sessions)
Experiment and iterate on models by tuning parameters and testing new approaches, continuously improving accuracy and effectiveness through rigorous evaluation
Document and ensure reproducibility of ML work, track experiments, code, and model versions to foster knowledge sharing and maintain high standards in the team
Collaborate cross-functionally with software engineers, customers, and product stakeholders

Qualifications

Experience in ML engineering.
Hands-on experience deploying models at scale, including familiarity with containerization (Docker, Kubernetes) and cloud platforms (AWS, GCP, or Azure) to build and operate ML systems in production.
Experience with Prometheus, Grafana, distributed tracing, or ML-specific monitoring (Weights & Biases, MLflow for production)
Proficiency in programming (especially Python) and experience with modern ML frameworks/libraries such as TensorFlow, PyTorch, etc.
Deep understanding of machine learning algorithms and the model development lifecycle (data preprocessing, training, parameter tuning, and evaluation)
Proven track record of delivering software that creates real value for users
Excellent communication skills with an ability to explain complex ML concepts to non-experts, and a collaborative approach to working with cross-functional teams and partners

Required Skills

Fluency in models
Adept at integrating production infrastructure and observability
Lead performance benchmarking
Comfortable working in code and diverse production environments
Strong focus on correctness and quality

Preferred Skills

Experience in a VC-backed startup environment
Familiarity with containerization and cloud platforms
Experience with ML-specific monitoring tools

Pay range and compensation package

Comprehensive Health, Dental, and Vision coverage beginning on the first day for employees and their families, paid 100% by Sailplane. Equity grant participation. Flexible PTO with no accrual or set annual cap, plus 15 paid holidays per year. Health and Wellness stipend ($3,000 annually) to help support your personal health goals. AI tools stipend ($1,200 annually) to encourage hands-on familiarity with emerging tools. 12 weeks of paid parental leave.

Equal Opportunity Statement

Sailplane is committed to diversity and inclusivity in the workplace.

California, Maryland, United States

Compétences linguistiques

English

Avis aux utilisateurs

Cette offre provient d’une plateforme partenaire de TieTalent. Cliquez sur « Postuler maintenant » pour soumettre votre candidature directement sur leur site.

Postuler Maintenant