XX
Lead Site Reliability EngineerIron Mountain β€’ Madison, Wisconsin, United States

This job offer is no longer available

XX

Lead Site Reliability Engineer

Iron Mountain
  • US
    Madison, Wisconsin, United States
  • US
    Madison, Wisconsin, United States

About

At Iron Mountain we know that work, when done well, makes a positive impact for our customers, our employees, and our planet. That's why we need smart, committed people to join us. Whether you're looking to start your career or make a change, talk to us and see how you can elevate the power of your work at Iron Mountain.

We provide expert, sustainable solutions in records and information management, digital transformation services, data centers, asset lifecycle management, and fine art storage, handling, and logistics. We proudly partner every day with our 225,000 customers around the world to preserve their invaluable artifacts, extract more from their inventory, and protect their data privacy in innovative and socially responsible ways.Β 

Are you curious about being part of our growth stor​y while evolving your skills in a culture that will welcome your unique contributions? If so, let's start the conversation.

Job Summary

Iron Mountain is seeking a proactive and skilled Observability Automation & Integration Lead Engineer to join our Infrastructure Transformation team.

In this role, you will be responsible for implementing, managing, and enhancing enterprise observability and automation platforms to ensure optimal network and application performance across a global ecosystem.

The Infrastructure Transformation team is a dynamic group dedicated to modernizing our technical infrastructure, driving efficiency through automation, and ensuring the continuous availability of critical systems.

What You'll Do (Responsibilities)

In this role, you will:

  • Responsibility 1: Drive Enterprise Platform Engineering - Design, implement, and maintain highly available, 24x7 continuous monitoring solutions using platforms like Datadog and SolarWinds , including configuring alerts, creating dashboards, and conducting data trend analysis.

  • Responsibility 2: Champion Automation & Integration - Collaborate with Enterprise Architects and operations teams to automate infrastructure operations, integrate monitoring data with platforms like Configuration Management Database (CMDB)/ServiceNow , and identify opportunities for proactive monitoring solutions.

  • Responsibility 3: Ensure Design and Operational Adherence - Ensure compliance with architectural governance and security standards in all designs, drive process improvements, and provide on-call support for critical issues outside of normal business hours.

What You'll Bring (Skills & Qualifications)

The ideal candidate will have:

  • 10+ years of experience in monitoring platform engineering with tools such as Datadog, SolarWinds, Prometheus, or Grafana .

  • Strong knowledge of network and application performance monitoring , including configuring monitors using protocols like Simple Network Management Protocol (SNMP), Secure Shell (SSH), Windows Remote Management (WinRM), Windows Management Instrumentation (WMI), or Java Management Extensions (JMX) .

  • Proven ability in automating infrastructure operations using tools like Ansible and Python and integrating systems via Representational State

  • Madison, Wisconsin, United States

Languages

  • English
Notice for Users

This job was posted by one of our partners. You can view the original job source here.