Über
You'll have the time, space, and support to dive deep into your projects, building lasting technical and strategic mastery alongside developers, product stakeholders, and project managers as a trusted leader. We believe in continuous growth.
Our team is dedicated to professional development and knowledge-sharing. We protect balance.
Our distributed team culture is based on trust and flexibility, offering unlimited PTO, a flexible remote work policy, and a supportive environment prioritizing sustainable, long-term performance. About the Role The Cloud Infrastructure and AI DevOps Manager leads teams responsible for designing, operating, and scaling AI/ML infrastructure, cloud platforms, and DevOps automation that supports enterprise model training, inference, and generative AI workloads. This role involves strategizing and executing cloud-native, Kubernetes-based platforms to enable reliable, secure, and cost-efficient AI systems. As a manager, this position blends hands-on technical leadership with people management, delivery ownership, and strategic decision-making. The role oversees distributed compute environments, GPU clusters, CI/CD pipelines, and vector-search infrastructure while ensuring high availability, resilience, and compliance with security and responsible AI standards. This manager will work closely with AI engineering, data engineering, product, and security teams, acting as the primary technical owner for assigned initiatives and communicating system risks, trade-offs, and progress to leadership. Key responsibilities include: Leading engineering teams responsible for AI/ML infrastructure, cloud operations, and MLOps automation. Defining cloud, Kubernetes, and infrastructure strategy to support scalable model training, inference, and generative AI platforms. Guiding the design and operation of distributed compute environments, GPU clusters, and vector database infrastructure. Overseeing CI/CD pipelines that automate model training, testing, deployment, monitoring, and lifecycle management. Managing incident response, failure analysis, and reliability engineering across AI platforms. Directing performance testing, capacity planning, and cost optimization for AI infrastructure. Ensuring compliance with cloud security, IAM practices, governance requirements, and responsible AI frameworks. Implementing multi-cloud resilience patterns, high availability, and automated failover for critical AI workloads. Supporting platform modernization initiatives, including adoption of optimized LLM runtimes and new orchestration technologies. Evaluating third-party infrastructure tools, GPU scheduling solutions, and platform enhancements. Communicating system status, dependencies, risks, and technical decisions to senior leadership. Managing 4-5 direct reports, including coaching, performance management, and career development. Owning project delivery, including budget, timelines, and quality of outcomes. Coordinating with sales and stakeholders on project sizing, feasibility, and strategic opportunities. Driving continuous improvement initiatives to advance DevOps maturity and AI infrastructure operational readiness. Qualifications 7+ years of professional experience in DevOps, cloud engineering, MLOps, or platform engineering. 2+ years of experience in engineering leadership or senior technical leadership roles. Expert proficiency with distributed cloud systems, Kubernetes, and infrastructure-as-code. Advanced troubleshooting skills in infrastructure, networking, container, and deployment issues. Proficiency in Python, Bash, or similar automation and scripting languages. Strong understanding of monitoring, observability, and reliability engineering patterns. Hands-on experience supporting infrastructure for ML or generative AI workloads. Strong leadership, communication, and cross-functional collaboration skills. Preferred Qualifications Bachelor's degree in computer science, engineering, cloud computing, or a related field. Master's degree in a technical discipline. Cloud and AI certifications, including Azure or equivalent AWS/GCP certifications. Extensive experience with Kubernetes platforms and cloud ML services. Experience with GPU workload orchestration, optimization, and multi-tenant inference environments. Expertise in observability and distributed tracing tools. Strong experience with Terraform and infrastructure governance at scale. Familiarity with service mesh architectures and advanced deployment patterns. Advanced experience supporting generative AI platforms and LLM inference runtimes. Experience operating fine-tuned LLMs, managing GenAI CI/CD pipelines, and implementing monitoring solutions. Demonstrated ability to make strategic technical decisions within defined delivery and budget constraints. We expect the candidate to uphold Crowe's values of Care, Trust, Courage, and Stewardship, acting ethically and with integrity at all times. The application deadline for this role is 04/30/2026. In compliance with federal law, all hired individuals will be required to verify their identity and eligibility to work in the United States and complete the required employment eligibility verification form upon hire. Crowe is not sponsoring work authorization at this time. The wage range for this role considers a variety of factors including skill sets, experience, training, and other organizational needs. A reasonable estimate of the current range is $102,400.00 - $204,100.00 per year. Our Benefits: At Crowe, we know that great people are what makes a great firm, and we offer a comprehensive total rewards package. How You Can Grow: We nurture your talent in an inclusive culture that values diversity, providing opportunities for meeting with your Career Coach to guide you in your career aspirations. More about Crowe: Crowe is one of the largest public accounting, consulting, and technology firms in the United States, offering audit services and helping clients achieve their goals across various services. Crowe promotes equal employment opportunities and prohibits discrimination of any type. We also consider qualified applicants with criminal histories in compliance with applicable laws.
Sprachkenntnisse
- English
Hinweis für Nutzer
Dieses Stellenangebot stammt von einer Partnerplattform von TieTalent. Klicken Sie auf „Jetzt Bewerben“, um Ihre Bewerbung direkt auf deren Website einzureichen.