XX
Staff Software EngineerRemoteHunterUnited States
XX

Staff Software Engineer

RemoteHunter
  • US
    United States
  • US
    United States
Postuler Maintenant

À propos

About the Opportunity:

The organization delivers AI solutions that integrate into core business processes, enabling teams to develop, deliver, and govern AI at scale. The AI Compute team builds and operates the foundational compute infrastructure supporting all AI products and demanding workloads, with a focus on performance, efficiency, and scalability. The Staff Software Engineer will lead technical initiatives, contribute hands-on to complex problems, shape architecture, and mentor other engineers, supporting resilient and high-performing infrastructure systems.

Responsibilities:


• Build systems ensuring micro-services are secure, performant, reliable, and move quickly from idea to production.


• Develop solutions that optimize and recommend right-sized Kubernetes resources for efficient cloud usage.


• Design and architect automated quality platforms to accelerate release cycles while maintaining performance, security, and reliability.


• Collaborate with Product, Legal, and Security to ensure compliance and security in continuous delivery processes.


• Ensure operational playbooks for pipelines, supporting 24/7 system operation.


• Collaborate with architects and platform engineers to set continuous delivery and performance requirements.


• Work with internal product managers to set roadmaps and define milestones for innovative solutions to platform engineering challenges.

Requirements:


• Expert proficiency in Kubernetes architecture and operations, including resource management, scheduling, auto-scaling, Gateway API/Ingress, Prometheus, OpenTelemetry, or similar orchestrators (nomad/slurm).


• Experience with GPU clusters or multi-node AI/ML environments.


• Experience setting technical direction, architectural decision-making, and driving consensus across teams.


• Proven track record leading large-scale projects.


• Experience mentoring senior engineers and fostering a positive team culture.


• Operational excellence in defining and improving SLA based on customer experience.


• 8+ years of experience in software development.


• 5+ years of experience with Python in diverse software projects.


• Experience designing and operating CI/CD pipelines with


• Experience building and maintaining large-scale build, testing, and deployment systems for Kubernetes, including Helm charts.


• Preferred: Experience with Golang, Terraform, Terragrunt, Chronosphere, and multi-cloud platforms (AWS, Azure, GCP, OpenShift).


• Nice to have: Experience with distributed compute frameworks (Ray, Dask), large-scale job schedulers (Slurm, Kueue), CKAD certification, public contributions to development projects, agentic AI experience, and managing NVIDIA infrastructure (NIM Operator, NVIDIA GPU Operator).

Benefits & Perks:


• Medical, Dental, and Vision Insurance


• Flexible Time Off Program


• Paid Holidays


• Paid Parental Leave


• Global Employee Assistance Program (EAP)

Note:

RemoteHunter is not the Employer of Record (EOR) for this role. Our purpose in this opportunity is to connect exceptional candidates with leading employers. We help job seekers worldwide discover roles that match their goals and guide them to complete their full application directly through the hiring company's career page or ATS.

  • United States

Compétences linguistiques

  • English
Avis aux utilisateurs

Cette offre provient d’une plateforme partenaire de TieTalent. Cliquez sur « Postuler maintenant » pour soumettre votre candidature directement sur leur site.