Principal Engineer, Operational Excellence

RemoteHunter

United States

United States

Über

About the Opportunity:

This opportunity is with a global cybersecurity organization focused on protecting the people, processes, and technologies that support modern businesses. The Technology Resilience Principal Engineer leads the technology resilience function, driving the strategy and execution of resilience practices across the organization''s technology stack. The role ensures comprehensive technical resilience standards and practices across infrastructure, applications, and products, supporting business continuity and rapid recovery capabilities.

Responsibilities:

• Coordinate technology resilience initiatives across IT, Product, Engineering, and business units, aligning with business objectives

• Maintain enterprise-wide technology resilience standards to ensure consistent implementation across infrastructure, application, and product domains

• Develop and drive technical resilience architecture, including infrastructure redundancy, application resilience, and chaos engineering frameworks

• Lead technical recovery strategy development and implementation, covering backup systems, RTO/RPO, and data restoration procedures

• Define and implement product resilience standards, such as feature flagging, release testing, multi-tenancy, and scalability frameworks

• Oversee technology resilience risks and monitor key performance indicators, including system uptime

• Lead chaos engineering and resilience testing programs for proactive resilience validation and improvement

• Manage evaluation and implementation of shared resilience tooling for monitoring, testing, and recovery automation

• Build relationships with business units, engineering teams, and external partners for effective stakeholder engagement

• Act as senior technical advisor during major incidents, coordinating technical recovery strategies

• Identify and implement emerging technologies and methodologies to enhance resilience

• Mentor junior team members and share resilience engineering best practices

Requirements:

• 10+ years of experience in technology resilience, disaster recovery, site reliability engineering, or related technical fields, with expertise in enterprise-scale cloud-native environments

• Strong understanding of infrastructure redundancy, application resilience, chaos engineering, and disaster recovery strategies across hybrid cloud architectures

• Experience with feature management, progressive deployment, multi-tenant architecture resilience, and scalability engineering

• Demonstrated ability to drive strategic initiatives and influence senior stakeholders in large technology organizations

• Experience establishing and monitoring resilience metrics including system uptime, MTTR, RTO/RPO, and deployment success metrics

• Advanced certifications in disaster recovery, cloud architecture, or site reliability disciplines (e.g., DRCS, CISSP, AWS/Azure/GCP architecture certifications)

• Excellent written and oral communication skills

Note:

RemoteHunter is not the Employer of Record (EOR) for this role. Our purpose in this opportunity is to connect exceptional candidates with leading employers. We help job seekers worldwide discover roles that match their goals and guide them to complete their full application directly through the hiring company's career page or ATS.

United States

Sprachkenntnisse

English

Hinweis für Nutzer

Dieses Stellenangebot wurde von einem unserer Partner veröffentlicht. Sie können das Originalangebot einsehen hier.

Ähnliche Jobs finden