Über
At Fundamental, you'll work on unprecedented technical challenges in foundation model development and build technology that transforms how the world's largest companies make decisions. This is your opportunity to be part of a category-defining company from the ground-up. Join the team defining the future of enterprise AI.
Key responsibilities
Design and implement cloud infrastructure from the ground up
Build and maintain Kubernetes clusters optimized for GPU workloads and ML applications, as well as Production SaaS hosting
Implement GitOps practices using ArgoCD for continuous deployment
Develop infrastructure as code using Terraform
Create and maintain CI/CD pipelines for infrastructure and application deployment
Implement monitoring and observability solutions for distributed systems
Automate infrastructure management with Python and Bash
Collaborate with ML engineers to optimize infrastructure for model training and serving
Implement and maintain cost optimization strategies (FinOps) for cloud resources
Monitor and optimize cloud spending, especially for GPU-intensive workloads
Must have
5+ years of experience in cloud infrastructure and DevOps
3+ years of experience with Python
Strong experience with AWS and GCP cloud platforms
Deep expertise in Kubernetes, including multi-cluster management, GPU workload optimization, resource scheduling and autoscaling, and network policies and security
Experience with GitOps tools (ArgoCD preferred)
Extensive experience with cloud networking, including VPC design, load balancer configuration, network security and segmentation, and cross-cloud networking solutions
Strong CI/CD expertise, preferably with GitHub Actions
Proficiency in infrastructure as code (Terraform)
Experience with monitoring and observability tools
Experience with FinOps practices and cloud cost optimization
Nice to have
Experience with ML workflow tooling (MLflow, Kubeflow, or similar)
Experience with FastAPI and Backend applications
Familiarity with data platforms like Databricks or Snowflake
Exposure to SRE practices or cloud security certifications
Hands-on experience with Prometheus, Grafana, or Datadog
Benefits
Competitive compensation with salary and equity
Comprehensive health coverage, including medical, dental, vision, and 401K
Fertility support, as well as paid parental leave for all new parents, inclusive of adoptive and surrogate journeys
Relocation support for employees moving to join the team in one of our office locations
A mission-driven, low-ego culture that values diversity of thought, ownership, and bias toward action
#J-18808-Ljbffr
Sprachkenntnisse
- English
Hinweis für Nutzer
Dieses Stellenangebot stammt von einer Partnerplattform von TieTalent. Klick auf „Jetzt Bewerben”, um deine Bewerbung direkt auf deren Website einzureichen.