XX
Principal ML Engineer, Machine Learning Platform – Systems ArchitectureremoterocketshipUnited States
XX

Principal ML Engineer, Machine Learning Platform – Systems Architecture

remoterocketship
  • US
    United States
  • US
    United States

À propos

Job Description:
Lead architecture and delivery for major ML platform capabilities across training, evaluation, deployment, and observability Design scalable systems for distributed training, data processing, feature and model lifecycle management, and production inference Own platform-level technical outcomes from design through deployment, operations, and continuous improvement Drive the design and scaling of data pipelines for large-scale structured and semi-structured technical datasets Lead architecture for distributed data processing and orchestration systems such as Ray, Airflow, Spark, or similar platforms Establish strong practices for data lineage, provenance, governance, and responsible data usage in ML systems Guide the design of model deployment, inference services, monitoring, and observability for production ML workloads Contribute to the development of ML-ready representations for geometry, graph, hierarchical, or multimodal data Clarify ambiguous problem spaces, define solution approaches, and lead execution across multiple engineers and teams Establish and improve engineering standards, operational practices, and architectural patterns for ML systems Lead incident response for critical platform issues and drive lasting improvements across system health and supportability Mentor engineers and act as a force multiplier through design leadership, coaching, and technical reviews Communicate technical strategy, tradeoffs, and execution plans clearly to technical and non-technical stakeholders Requirements:
Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field, or equivalent industry experience Typically 6 to 8 years of industry experience in software engineering, ML infrastructure, distributed systems, or platform engineering, including experience leading design and delivery of complex technical systems Deep experience in software architecture, distributed systems, large-scale data platforms, or ML infrastructure Strong proficiency in Python and strong command of production software engineering practices Experience leading complex technical initiatives that span multiple engineers or cross-functional teams Strong experience with large-scale data pipelines, distributed data processing, and cloud-native platform architectures Experience with model deployment, inference systems, and production observability Demonstrated ability to make architecture decisions that balance performance, scalability, reliability, and cost Strong communication and stakeholder management skills Benefits:
Health and financial benefits Time away and everyday wellness
  • United States

Compétences linguistiques

  • English
Avis aux utilisateurs

Cette offre provient d’une plateforme partenaire de TieTalent. Cliquez sur « Postuler maintenant » pour soumettre votre candidature directement sur leur site.