XX
DevOps EngineerAgileRLLondon, England, United Kingdom
XX

DevOps Engineer

AgileRL
  • GB
    London, England, United Kingdom
  • GB
    London, England, United Kingdom
Postuler Maintenant

À propos

Overview
Join to apply for the
DevOps Engineer
role at
AgileRL . At AgileRL, we are on a mission to accelerate reinforcement learning for building superhuman artificial intelligence systems. We offer Arena, an enterprise-grade reinforcement learning operations (RLOps) platform and a state-of-the-art open-source framework to accelerate RL development. Arena focuses on simulation, training, deployment and monitoring to enable scalable reinforcement learning workflows. We work with companies across industries to deliver autonomous solutions and are looking for talented engineers to develop the systems and tools that will enable the next wave of impactful AI. Responsibilities
Design and maintain robust, scalable cloud infrastructure to support high-performance reinforcement learning workloads and distributed training environments Build and optimise CI/CD pipelines for both our open-source framework and Arena enterprise platform, ensuring reliable deployments and automated testing Implement and manage containerisation strategies using Docker and Kubernetes for ML model training, deployment, and orchestration Develop infrastructure as code (IaC) solutions using tools like Terraform, CloudFormation, or Pulumi to ensure reproducible and version-controlled infrastructure Monitor system performance, implement alerting and logging solutions, and troubleshoot production issues across distributed ML training environments Collaborate with ML engineers to optimise resource allocation and cost efficiency for compute-intensive RL training workloads Implement security best practices, manage access controls, and ensure compliance with enterprise security requirements Automate operational tasks including backup strategies, disaster recovery procedures, and system maintenance Support the deployment and scaling of GPU clusters and distributed computing resources for reinforcement learning applications Maintain high availability and performance of production systems serving ML models to external customers Requirements
Bachelor\'s degree or higher in Computer Science, Engineering, or a related field, or 3+ years of relevant DevOps/infrastructure experience Strong experience with cloud platforms (AWS, GCP, Azure) and their ML/AI services, with expertise in managing compute-intensive workloads Proficiency in containerisation technologies (Docker, Kubernetes) and container orchestration for ML workloads Experience with Infrastructure as Code tools (Terraform, CloudFormation, Pulumi) and configuration management Solid understanding of CI/CD principles and tools (GitHub Actions, GitLab CI, Jenkins) with experience in ML pipeline automation Knowledge of monitoring and observability tools (Prometheus, Grafana, OpenObserve) and their application to ML systems Experience with GPU infrastructure management and distributed computing frameworks for machine learning Familiarity with MLOps practices and tools for model deployment, versioning, and lifecycle management Strong scripting skills in Python, Bash, or similar languages for automation tasks Understanding of networking, security, and database management in cloud environments Experience with high-performance computing environments and job scheduling systems is a plus Knowledge of machine learning workflows and the unique infrastructure requirements of ML training and inference Strong problem-solving skills and ability to work in a fast-paced, collaborative environment Excellent communication skills and experience working with cross-functional teams Compensation
Competitive salary + significant stock options 30 days of holiday, plus bank holidays, per year Flexible working from home and 6-month remote working policies Enhanced parental leave Learning budget of £500 per calendar year for books, training courses and conferences Company pension scheme Regular team socials and quarterly all-company parties Learn more about AgileRL at https://agilerl.com
#J-18808-Ljbffr
  • London, England, United Kingdom

Compétences linguistiques

  • English
Avis aux utilisateurs

Cette offre provient d’une plateforme partenaire de TieTalent. Cliquez sur « Postuler maintenant » pour soumettre votre candidature directement sur leur site.