Offres d'emploi
Trouvez des postes près de chez vous, sur site, hybrides ou à distance.- Emplois similaires à : Lead Site Reliability Engineer
Lead Site Reliability Engineer
MastercardDunstableOur Purpose Mastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, we’re helping build a sustainable economy where everyone can pro
Lead Site Reliability Engineer
MastercardLondonOur Purpose Mastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, we’re helping build a sustainable economy where everyone can pro
Lead Site Reliability Engineer
MastercardDunstableOur PurposeMastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, we’re helping build a sustainable economy where everyone can prospe
Sr. DevOps Engineer - AI and Site Reliability Engineering
TeradataSaint PaulOur company At Teradata, we believe that people thrive when empowered with better information. Teradata Autonomous Knowledge Platform activates enterprise intelligence by unifying data, knowledge and
Sr. DevOps Engineer - AI and Site Reliability Engineering
Teradata Corporation (SE)TrentonOur company At Teradata, we believe that people thrive when empowered with better information. Teradata Autonomous Knowledge Platform activates enterprise intelligence by unifying data, knowledge and
Site Reliability Engineer - Chags Health Information Technology Llc
Gravity Engineering Services Pvt Ltd.RestonSite Reliability Engineer at the organization. Key technologies: Kubernetes, Prometheus, Grafana.Key ResponsibilitiesDefine and track SLOs, SLIs and error budgetsDesign and implement observability sta
Senior Manager, Cloud Platform & Site Reliability
BasetenSan FranciscoSenior Manager Of Cloud Platform And Site ReliabilityBaseten powers mission-critical inference for the world's most dynamic AI companies. By uniting applied AI research, flexible infrastructure, and s
Hardware Reliability Engineer, Devices Reliability Engineering
AmazonSunnyvaleHardware Reliability Engineer, Devices Reliability Engineering Job ID: 10456089 | Amazon.com Services LLCAmazon Devices is an inventive research and development company that designs and engineers high
Senior Manager, Site Reliability Engineering (FedRAMP) - ThousandEyes
Webex Events (formerly Socio)New YorkCisco ThousandEyes FedRAMP TeamThe Cisco ThousandEyes FedRAMP team builds and operates our US GovCloud platform. This team is responsible for architecting, delivering, and maintaining our FedRAMP offe
Site Reliability Engineer 2 DevOps | REMOTE (US Citizenship required)
DevOpsChatNew YorkSite Reliability Engineer 2 DevOps | REMOTE (US Citizenship required)The job opening for a Site Reliability Engineer (SRE) at Jobicy emphasizes the importance of enhancing system performance and relia
Senior Manager of Software Engineering: Site Reliability Engineering
Jack Henry & Associates, Inc.New YorkAt Jack Henry, we’re more than a technology company, we’re a force for good in financial services. We’re redefining how community banks and credit unions connect with the people they serve. Our missio
Datacenter Hardware Lead: Fleet Reliability
OpenAIAbileneOpenAI is seeking a Datacenter Hardware Operations Technician Lead to oversee hardware reliability and fleet health at its Abilene, Texas campus. The role requires expertise in managing datacenter har
Datacenter Hardware Reliability Engineer
AmazonHerndonAmazon is looking for a Hardware Reliability Engineer in Herndon, VA, to manage reliability risk in AWS infrastructure. This role requires at least 4 years of experience in a high-reliability industry
Data Center QA/QC & Reliability Lead
Edison Smart®PhoenixEdison Smart® is seeking an experienced Quality & Reliability Manager (QRM) in Phoenix, Arizona. This role leads the QA/QC program for mission-critical data center projects, ensuring all installations
Senior Space Hardware Reliability Engineer
AmazonRedmondAmazon in Redmond, WA is seeking a Senior Reliability Engineer as part of their Hardware Reliability Engineering team focused on satellite development. The ideal candidate will ensure functional relia
Senior System Engineer: Reliability & Maintainability
Lockheed MartinPalmdaleLockheed Martin in Palmdale, CA is seeking a System Engineering Senior Reliability Maintainability Engineer for their Advanced Development Programs team. This role involves shaping the reliability and
Senior Applications Operations & Reliability Engineer
Y99000 General Electric CompanyLouisianaY99000 General Electric Company is looking for a professional to support and maintain multiple user-facing applications in Louisiana. You will ensure application availability and user satisfaction thr
Embedded Firmware Lead - Own OTA & Fleet Reliability
LineVisionBostonLineVision, a leading technology company in Boston, seeks an Embedded Software Engineering Lead to spearhead the development of embedded software for our innovative remote sensor platform. The role in
Senior Network Engineer: Automation & Global Reliability
Bell TechlogixIndianapolisBell Techlogix in Indianapolis is seeking a Sr. Network Engineer responsible for ensuring the availability, performance, and security of the network infrastructure. Key duties include lifecycle manage
Network Engineer - Flexible Schedule & Reliability Expert
Palmetto Citizens Federal Credit UnionColumbiaPalmetto Citizens Federal Credit Union in Columbia, South Carolina, is seeking a Network Engineer to oversee the design and management of the organization’s network infrastructure. This includes ensur
Senior NOC Engineer & SME – Network Reliability
Evolving Solution ServicesOklahoma CityEvolving Solution Services is seeking a NOC Engineer Tier III to provide advanced network support and serve as a primary escalation point for lower-tier NOC engineers. This role includes training, doc
Quality Engineer II - Design & Reliability Assurance
Integra LifeSciences CorporationPrinceton## Quality Engineer II - Design & Reliability AssuranceApplylocations:US-NJ-Princeton-100-Headquarters:US-NJ-Plainsboro-105-Facility:US-MA-Braintree-220-Facility:US-IL-Chicago-NA-Facility:US-MD-Columb
Senior DevOps Engineer: Cloud, Kubernetes & Reliability
IMG LiveNew YorkIMG LIVE is looking for a Senior Dev Ops Engineer who will design cloud infrastructure and improve system reliability. The role includes managing Kubernetes deployments, building CI/CD pipelines, and
PCB Reliability Engineer for Space Hardware
Future VenturesBastropSpaceX is seeking a Hardware Reliability Engineer for its PCB production line to ensure top-quality standards. The ideal candidate will work across various teams to guarantee printed circuit boards me
Senior DevOps Engineer: Cloud, Automation & Reliability
JobrSan Diegojobr.pro is seeking a motivated DevOps Engineer in San Diego, California to enhance system reliability and scalability through automation. You'll work closely with development teams to optimize proces
Lead Site Reliability Engineer
- Dunstable, England, United Kingdom
- Dunstable, England, United Kingdom
À propos
Our Purpose
Mastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, we’re helping build a sustainable economy where everyone can prosper. We support a wide range of digital payments choices, making transactions secure, simple, smart and accessible. Our technology and innovation, partnerships and networks combine to deliver a unique set of products and services that help people, businesses and governments realize their greatest potential.
Title and Summary
Lead Site Reliability Engineer Who is Mastercard?At Mastercard technology, we work to connect and power an inclusive, digital economy that benefits everyone, everywhere, by making transactions safe, simple, smart, and accessible. Using secure data and networks, partnerships, and passion, our innovations and solutions help individuals, financial institutions, governments, and businesses realize their greatest potential. Our decency quotient, or DQ, drives our culture and everything we do inside and outside of our company. We cultivate a culture of inclusion for all employees that respects their individual strengths, views, and experiences. We believe that our differences enable us to be a better team – one that makes better decisions, drives innovation, and delivers better business results.
Technology at Mastercard
What we create today will define tomorrow. Revolutionary technologies that reshape the digital economy to be more connected and inclusive than ever before. Safer, faster, more sustainable.
And we need the best people to do it. Technologists who are energized by the challenges of a truly global network. With the talent and vision to create the critical systems and products that power global commerce and connect people everywhere to the vital goods and services they need every day.
Working at Mastercard means being part of a unique culture. Inclusive and diverse, a rich collaboration of ideas and perspectives. A place that celebrates your strengths, values your experiences, and offers you the flexibility to shape a career across disciplines and continents. And the opportunity to work alongside experts and leaders at every level of the business, improving what exists, and inventing what’s next.
About the Role
The Business Operations team is seeking a highly motivated and experienced Lead Site Reliability Engineer (SRE) to join our team. You will play a critical role in ensuring the reliability, scalability, and performance of our applications, supporting essential services that power Mastercard's global operations. As a thought leader in your field, you will bring technical expertise, a passion for automation, and the ability to mentor.
The role of the Business Operations Site Reliability Engineer is to be the production readiness steward for Mastercard products. As Business Operations SRE, we are responsible for ensuring that our platform is stable and healthy. We break down barriers to running our products by fostering developer run ownership and empowering developers to build resilient products. We support our developers during the application build phase in software run principles that include operational design, automation, capacity planning, and monitoring that leads to fault-tolerant, scalable products. We see the big picture and help create and enforce operations standards while facilitating an agile and learning culture.
We support daily operations with a hyper focus on triage, root cause by understanding the business impact of our products and subsequently performing blameless post-mortems. The goal of every Business Operations team is to engage early in the development lifecycle to be more proactive and upfront in the development process, and to proactively manage production and change activities to maximize customer experience and increase the overall value of supported applications.
Business Operations teams also focus on risk management by tying all our activities together with an overarching responsibility for compliance and risk mitigation across all our environments. Ultimately, the role of Business Operations is to align Product and Customer Focused priorities with Operational needs by providing continuous feedback throughout the lifecycle.
As part of the Business Operations team, you will:
• Be a developing subject matter expert in the Site Reliability Engineering area, influencing stakeholders and applying advanced knowledge to drive achievement of area goals and initiatives by contributing to solution development and improvements for existing products, services, and/or processes.
• Implement and maintain high-availability system solutions, ensuring stability, performance, and operational continuity.
• Evaluate operational requirements to develop effective technical solutions within existing frameworks.
• Lead automation and scripting efforts to streamline operational processes and incident response workflows.
• Troubleshoot and resolve complex system issues, escalating as necessary to maintain system health and proactively address risks.
• Contribute to documentation, knowledge sharing, and best practices to improve team operational procedures.
• Conduct reviews and quality assurance activities to uphold organizational standards for system stability.
• Keep current with industry trends and emerging technologies relevant to system reliability and operational automation.
• Guide and mentor junior team members through on-the-job experiences, reviewing work and fostering a culture of continuous improvement to grow expertise around their discipline.
Role qualifications:
The ideal candidate will apply the following skills independently and consistently in complex or nuanced situations, begin using the skills to support broader goals, and be recognized as a key contributor who may coach or support others informally.
• Observability - Ability to use scripting and tooling to implement observability solutions, enabling the collection, analysis, and visualization of metrics, logs, and traces to support incident detection, diagnosis, and continuous service improvement.
• Programming and Scripting - Ability to write and maintain code and scripts to automate tasks, build operational tools, and support monitoring, deployment, and incident response using languages such as Python, Go, Bash, or similar.
• Systems and Network Administration - Ability to configure, operate, and troubleshoot Linux/Unix systems and network components, applying knowledge of networking concepts, protocols, security, and system reliability.
• Cloud Computing and Infrastructure - Ability to design, deploy, and manage applications and infrastructure on cloud platforms (e.g., AWS, Azure, GCP), ensuring scalability, security, availability, and operational efficiency.
• Reliability and Scalability - Ability to design and operate systems for high availability, fault tolerance, and disaster recovery, while ensuring systems can scale to meet current and future demand
• DevOps Practices - Ability to apply DevOps principles and practices, including CI/CD pipelines, containerization, and orchestration, to enable faster, more reliable software delivery and operations.
• Troubleshooting - Capability to systematically identify, diagnose, and resolve technical issues across systems, applications, and networks, using analytical methods and tools to restore functionality, minimize disruption, and ensure stable operations.
• Capacity Planning and Performance Optimization - Ability to monitor resource utilization, forecast future capacity needs, and optimize system performance to support growth, scalability, and efficient infrastructure usage.
• IT Service Management - Ability to apply IT service management principles to incident, problem, and change management, ensuring reliable service delivery, effective incident response, and continuous service improvement aligned to business needs.
• Proactive Monitoring and Improvement (SRE Applications) - The ability to use application reliability signals to anticipate issues, identify risks, and drive preventative improvements that enhance application performance and availability.
Corporate Security Responsibility
All activities involving access to Mastercard assets, information, and networks comes with an inherent risk to the organization and, therefore, it is expected that every person working for, or on behalf of, Mastercard is responsible for information security and must:
Abide by Mastercard’s security policies and practices;
Ensure the confidentiality and integrity of the information being accessed;
Report any suspected information security violation or breach, and
Complete all periodic mandatory security trainings in accordance with Mastercard’s guidelines.
Compétences linguistiques
- English
Cette offre provient d’une plateforme partenaire de TieTalent. Cliquez sur « Postuler maintenant » pour soumettre votre candidature directement sur leur site.