Cette offre d'emploi n'est plus disponible
À propos
You could be just the right applicant for this job Read all associated information and make sure to apply.
The NOC Lead, reporting to the NOC Manager, will provide strategic and technical leadership to a team of Principal Engineers and Engineers. This role is accountable for ensuring the stability and functionality of applications, batch processes, network, and infrastructure components. The Lead will drive operational excellence by maintaining maximum availability (99.9%-99.99%), overseeing incident management, and ensuring timely resolution of escalations to meet or exceed established SLAs. Additionally, this position will guide the team in implementing best practices, fostering collaboration, and delivering continuous improvements across the NOC environment.
Job Overview
The NOC Lead, reporting to the NOC Manager, will provide strategic and technical leadership to a team of Principal Engineers and Engineers. This role is accountable for ensuring the stability and functionality of applications, batch processes, network, and infrastructure components. The Lead will drive operational excellence by maintaining maximum availability (99.9%-99.99%), overseeing incident management, and ensuring timely resolution of escalations to meet or exceed established SLAs. Additionally, this position will guide the team in implementing best practices, fostering collaboration, and delivering continuous improvements across the NOC environment.
The Network Operations Center (NOC), a key part of iCIMS Technical Operations, is dedicated to monitoring applications and infrastructure to deliver an exceptional customer experience. The team ensures optimal performance by validating availability, coordinating cross-functional event responses, and communicating any customer-impacting incidents. Additionally, the NOC analyzes key performance indicators (KPIs) to forecast future trends and provide initial recommendations to the engineering team.
The Lead, NOC reports to the NOC Manager, will be responsible for maintaining the functionality of applications, batch processes, network, and infrastructure components. This role ensures maximum availability (99.9%-99.99%) and drives timely resolution of incidents or technical escalations to meet established SLAs.
Responsibilities
Success Metrics
Ensure Production Stability: Monitor availability and performance across the entire production environment to maintain optimal operations. Off hours support as needed
Leverage Monitoring Tools: Track cloud resource utilization and performance metrics to identify trends and potential issues proactively.
Data-Driven Insights: Generate regular performance reports and recommend enhancements based on detailed analysis.
Incident Management Excellence: Lead the restoration of normal service operations swiftly, including assessment, research, escalation, communication, and resolution management.
Execute Production Changes: Implement necessary changes to support both internal and external customer needs.
Operational Support: Provide effective triage and resolution for operational support requests.
Documentation & Standards: Review and refine SOPs, policies, procedures, and system requirements to ensure accuracy and relevance.
Automation Development: Create and maintain automation scripts using Python and Java to streamline processes and reduce manual effort.
Infrastructure as Code (IaC): Apply IaC practices to improve deployment efficiency, consistency, and scalability.
Comprehensive Documentation: Prepare detailed electronic documentation, including SLAs, performance metrics, installation guides, and implementation guides.
Reduce Manual Work: Identify repetitive tasks and implement automation solutions to eliminate inefficiencies. xcfaprz
Performance Reviews: Participate in monthly metric reviews to support uptime goals of 99.9%-99.99%.
Drive Innovation: Demonstrate passion, initiative, and urgency in seeking innovative solutions and resolving issues effectively.
Qualifications
Technical Expertise
8+ years in administration and production support experience with on-call responsibilities
10+ years of strong Cloud provider experience and demonstrated knowledge
6 Years Leadership Experience
1 Certification in any Technical Area
Observability tooling experience
Preferred
Preferred Qualifications
Experience with AWS / AWS Certifications
Exposure to other cloud technologies like Azure and GCP
#J-18808-Ljbffr
Compétences linguistiques
- English
Avis aux utilisateurs
Cette offre a été publiée par l’un de nos partenaires. Vous pouvez consulter l’offre originale ici.