XX
Director, Digital Reliability EngineeringRoyal Caribbean GroupMiami, Florida, United States
XX

Director, Digital Reliability Engineering

Royal Caribbean Group
  • US
    Miami, Florida, United States
  • US
    Miami, Florida, United States
Jetzt Bewerben

Über

Journey with us Combine your career goals and sense of adventure by joining our exciting team of employees Royal Caribbean Group is pleased to offer a competitive compensation and benefits package and excellent career development opportunities each offering unique ways to explore the world

The Royal Caribbean Group's Digital Team has an exciting career opportunity for a full-time Director Digital Reliability Engineering reporting to the VP of Engineering

The position is onsite and based in Miami Florida

Position Summary:

The Director Digital Reliability Engineering will lead the global Technology Operations portfolio for Royal Caribbean's Digital organization ensuring the reliability availability and performance of guest-facing pre-cruise platforms across web and mobile

This leader is responsible for both Site Reliability Engineering (SRE) practices and run-the-business engineering support Beyond incident response the Director is accountable for managing and delivering on the resolution of all production issues executing ongoing maintenance activities and coordinating technical communications This role also manages a dedicated engineering development capacity focused on production fixes ongoing maintenance and technical debt reduction This ensures that stability improvements are not only identified but also delivered This person is expected to walk the talk—able to jump in during incidents work side by side with engineers and demonstrate technical depth when guiding solutions

This is a hands-on role where the leader is expected to actively support teams during critical incidents work directly with engineers to troubleshoot and ensure sustained improvements in reliability

This role also carries executive accountability for critical incidents The Director must be prepared to provide leadership and direct support during major incidents at any time ensuring the organization responds with speed clarity and effectiveness

Essential Duties and Responsibilities:

  • Strategic Leadership
    Define and execute the global SRE strategy for Digital Operations- - aligning with business priorities and Royal Caribbean's long-term technology vision
    Build and nurture a culture of reliability resilience- and continuous improvement across all digital platforms
    Drive initiatives to maintain zero downtime by rapidly addressing issues conducting root cause analysis- and implementing remediations
    Build strong relationships with product management engineering design- and operations stakeholders
    Own and drive operational metrics (eg MTTx metrics incident rates error budgets- service availability) with visible progress and accountability

  • Hands-On Operational Engagement
    Lead global site reliability and operations teams across onshore nearshore- - and offshore locations while actively engaging in day-to-day challenges
    Actively participate in major incident response including log analysis recovery validation- and executive updates
    Lead problem bridges- collaborating across technical and functional teams for timely issue resolution
    Partner with engineers to diagnose troubleshoot and resolve critical issues in real time- demonstrating technical credibility
    Strengthen ITSM processes (Incident Problem Change Major Incident) using tools like ServiceNow PagerDuty- and JIRA

  • Run-the-Business
    Lead engineering support for production issue remediation ensuring timely root-cause analysis resolution- - and prevention of recurring problems
    Lead a dedicated production engineering team responsible for developing and deploying fixes patches- and enhancements that improve reliability and guest experience
    Ensure development workstreams include not only feature delivery but also operational hardening technical debt remediation- and defect resolution
    Manage and prioritize ongoing maintenance activities patches upgrades- and operational improvements across the digital technology stack

  • Establish strong feedback loops with product and engineering teams so that recurring issues and operational pain points are systematically eliminated

  • Technology & Engineering

    • Work directly with teams to ensure the reliability of a hybrid technology stack spanning:
      Mobile: Native iOS Android- - and cross-platform frameworks
      Web: React Angular- and modern web technologies
      Backend Services: Microservices APIs- and integration layers
    • Commerce: SAP Hybris platform
      Cloud Infrastructure: AWS (EC2 ECS S3 API Gateway) DKP/on-prem clusters- and observability pipelines
      Champion observability and performance practices leveraging platforms such as Splunk Dynatrace Prometheus- Quantum Metric / RUM tools
      Promote automation chaos engineering- and AI-driven anomaly detection to strengthen system resilience
      Guide teams in Infrastructure as Code- and modern operational tooling
      Environment Management: Oversee all environment activities- including new code deployments
  • Team Development & Leadership by Example
    Recruit mentor- - and develop global SRE talent while modeling hands-on technical engagement
    Encourage engineers to take ownership and proactively solve problems- supported by your direct involvement when needed

  • Manage vendor and partner teams with the same "roll-up-your-sleeves" approach as internal teams
  • Deliver executive-ready dashboards and insights to communicate the health of digital operations

Qualifications:

Bachelor's or Master's degree in Computer Science Engineering

  • or related field
    15+ years of experience in technology operations- including 8+ years in global leadership roles
    Engineering Management: Experience leading software engineering teams delivering production fixes and technical debt remediation- not only operational monitoring
  • Proven track record supporting and stabilizing large-scale digital/commerce platforms with high transaction volumes and direct customer impact
    Experience managing fast-paced 24x7 environments- demonstrating adaptability and confident decision-making
    Strong technical background in cloud platforms (AWS hybrid/on-prem clusters) container orchestration (Docker Kubernetes DKP)- and microservices
  • Deep understanding of SOA principles and Web Services
    Proficiency in scripting: Bash Python- JavaScript
  • Experience running and scaling commerce platforms (preferably SAP Hybris or equivalent)
    Advanced knowledge of observability performance engineering telemetry automation- and incident management frameworks
    Ability to personally dive into logs code- and dashboards during critical incidents
    Strong troubleshooting root-cause analysis- and application design skills
    Demonstrated ability to lead through crisis situations with composure speed- and clear communication

Knowledge and Skills:

  • Technical Depth & Breadth: Mobile web backend and commerce systems at enterprise scale
  • Leadership by Example: Hands-on willing to engage directly with engineers in solving problems
  • Strategic Thinking: Ability to drive long-term improvements while ensuring short-term incident readiness
  • Maintenance & Communication: Experience managing ongoing maintenance programs and crafting technical communications
  • Engineering Collaboration: Skilled at bridging operations and engineering to ensure production issues are treated as high-priority deliverables
  • Communication: Executive presence with the ability to brief leadership clearly during outages
  • Global Experience: Skilled at leading distributed teams and managing vendor partnerships
  • Resiliency Mindset: Comfortable with 24/7 operational accountability especially during major incidents

Financial Responsibilities:

Own and manage the Operational Expenditure (OPEX) budget for Digital Operations ensuring efficient allocation of resources while balancing reliability scalability

  • and cost optimization
  • Provide transparency into operational spend through regular reporting and executive updates
    Partner with Finance and Procurement to negotiate track- and optimize vendor contracts and third-party services
  • Ensure budget discipline while identifying opportunities for automation and efficiency improvements to reduce operational costs without compromising reliability

Working Conditions:

  • Global role requiring flexible availability to lead and engage directly in critical incidents outside of standard business hours
  • Domestic and international travel may be required to support operations and vendor partners

We know there's a lot to consider As you go through the application process our recruiters will be glad to provide guidance and more relevant details to answer any additional questions Thank you again for your interest in Royal Caribbean Group We'll hope to see you onboard soon

It is the policy of the Company to ensure equal employment and promotion opportunity to qualified candidates without discrimination or harassment on the basis of race color religion sex age national origin disability sexual orientation sexuality gender identity or expression marital status or any other characteristic protected by law Royal Caribbean Group and each of its subsidiaries prohibit and will not tolerate discrimination or harassment

#LI-MP1

Nearest Major Market: Miami

  • Miami, Florida, United States

Sprachkenntnisse

  • English
Hinweis für Nutzer

Dieses Stellenangebot stammt von einer Partnerplattform von TieTalent. Klicken Sie auf „Jetzt Bewerben“, um Ihre Bewerbung direkt auf deren Website einzureichen.