XX
Machine Learning EngineerSailplaneCalifornia, Maryland, United States
XX

Machine Learning Engineer

Sailplane
  • US
    California, Maryland, United States
  • US
    California, Maryland, United States

À propos

About the Company

In an unmarked building somewhere in Silicon Valley, a small team of engineers is working on what could become one of the most transformative technologies in enterprise computing: autonomous infrastructure that manages itself. Sailplane, backed by AI kingmaker Khosla Ventures (OpenAI's first investor) and seed specialist True Ventures, is building a "self-driving cloud" - intelligent agents capable of autonomously managing the largest and most advanced AI infrastructure on the planet. Sailplane is solving one of the most complex challenges in modern computing: autonomous management of massive AI data centers. We are creating intelligent agents that operate rack-scale systems worth millions of dollars. Think Waymo for cloud infrastructure. "We're building million-dollar agents," explains co-founder Sam Ramji, who previously led product at Google Cloud Platform and brought Linux to Microsoft. "These aren't consumer-grade chatbots - they're sophisticated autonomous systems managing rack-scale hardware worth millions per unit."

About the Role

Sailplane is an early-stage AI infrastructure startup. Expect to wear many hats (building ML platforms, MLOps tools, data/LLM infrastructure). You will bring a startup mindset, eager to take ownership of projects, navigate ambiguity, and move quickly to solve challenging problems in a fast-paced environment. As a ML Engineer, you'll lead the build and operations of LLMs in production on-premise for Sailplane. This is a senior individual contributor role focused on hands-on coding, systems thinking, and prototyping. You won't manage a team, but you will mentor and amplify those around you. You should be fluent in models, adept at integrating production infrastructure and observability, and lead performance benchmarking. You're comfortable working in code and in diverse production environments, and you care deeply about correctness and quality. This hybrid position reports to the CEO and is expected to work from our downtown San Francisco office 2 to 3 days per week.

Responsibilities

  • Build, deploy, monitor, and operate LLMs in production on-premises in diverse customer environments
  • Implement MLOps best practices (CI/CD pipelines, containerization, continuous monitoring) to ensure reliable performance
  • Benchmark performance and recommend solutions to improve customer deployments including hardware sizing for target throughput (tokens per second, concurrent user sessions)
  • Experiment and iterate on models by tuning parameters and testing new approaches, continuously improving accuracy and effectiveness through rigorous evaluation
  • Document and ensure reproducibility of ML work, track experiments, code, and model versions to foster knowledge sharing and maintain high standards in the team
  • Collaborate cross-functionally with software engineers, customers, and product stakeholders

Qualifications

  • Experience in ML engineering.
  • Hands-on experience deploying models at scale, including familiarity with containerization (Docker, Kubernetes) and cloud platforms (AWS, GCP, or Azure) to build and operate ML systems in production.
  • Experience with Prometheus, Grafana, distributed tracing, or ML-specific monitoring (Weights & Biases, MLflow for production)
  • Proficiency in programming (especially Python) and experience with modern ML frameworks/libraries such as TensorFlow, PyTorch, etc.
  • Deep understanding of machine learning algorithms and the model development lifecycle (data preprocessing, training, parameter tuning, and evaluation)
  • Proven track record of delivering software that creates real value for users
  • Excellent communication skills with an ability to explain complex ML concepts to non-experts, and a collaborative approach to working with cross-functional teams and partners

Required Skills

  • Fluency in models
  • Adept at integrating production infrastructure and observability
  • Lead performance benchmarking
  • Comfortable working in code and diverse production environments
  • Strong focus on correctness and quality

Preferred Skills

  • Experience in a VC-backed startup environment
  • Familiarity with containerization and cloud platforms
  • Experience with ML-specific monitoring tools

Pay range and compensation package

Comprehensive Health, Dental, and Vision coverage beginning on the first day for employees and their families, paid 100% by Sailplane. Equity grant participation. Flexible PTO with no accrual or set annual cap, plus 15 paid holidays per year. Health and Wellness stipend ($3,000 annually) to help support your personal health goals. AI tools stipend ($1,200 annually) to encourage hands-on familiarity with emerging tools. 12 weeks of paid parental leave.

Equal Opportunity Statement

Sailplane is committed to diversity and inclusivity in the workplace.

  • California, Maryland, United States

Compétences linguistiques

  • English
Avis aux utilisateurs

Cette offre provient d’une plateforme partenaire de TieTalent. Cliquez sur « Postuler maintenant » pour soumettre votre candidature directement sur leur site.