XX
DevOps/SRE EngineerYANTRAN LLCUnited States

This job offer is no longer available

XX

DevOps/SRE Engineer

YANTRAN LLC
  • US
    United States
  • US
    United States

About

Job Title: Lead System Engineer-EngOps T1 Location: TechM US Texas Plano Years of Experience: 7 10 Years JOB Description This position demands senior-level expertise, involving hands-on responsibilities that lead to the delivery of applications or services. The role entails translating core architecture aligned with business needs into comprehensive technical solutions encompassing platform, network, software, cloud, and more. The engineer defines designs and provides technical guidance for application components and subsystems. They play a key role in making design decisions for development teams. Major responsibilities include detailing the design and interfaces of specific components: defining subsystems and their interactions, assigning responsibilities, understanding deployment strategies, and communicating interface requirements within the solution context. The role also involves aligning development teams around a unified technical vision and collaborating with them to refine the solution and its interfaces; validating technology assumptions and assessing implementation options; establishing critical non-functional requirements (NFRs) at the solution level and contributing to others. This role requires senior technical expertise and deep knowledge of AT T technologies. Key Responsibilities The EngOps Tier-1 team functions as 24x7x365 journey-focused operations engineers, serving as the first line of defense responding to system incidents and disruptions. They maintain active oversight of CTX applications and related upstream and downstream systems to ensure thorough monitoring and rapid issue resolution. Proactive Monitoring: Continuously track alerts and dashboards, taking actions such as issuing C2W alerts or restarting pods when thresholds are exceeded. Issue Identification: Receive notifications of issues through real-time alerts and dashboards. Incident Validation: Investigate alerts to confirm validity and minimize false positives and anomalies. Notification Protocol: Notify Tier-2 and leadership teams via PagerDuty, C2W, MS Teams, or text, specifying severity and impact. Resolution: Collaborate with Tier-2 to resolve incidents within MTTR targets and participate in Root Cause Analysis (RCA) and After Action Reviews (AAR). Change Request Reviews: Participate in RCCBs to review change requests, validate alerts and dashboards, and approve changes. Sanity Testing: Perform health checks and regression tests on critical system components. Alert Optimization: Improve alert processes and adjust thresholds as necessary. Required Qualifications Education : Bachelor's degree in Computer Science, Information Systems, or a related discipline. Experience : Over 7 years in Tier-1 team functions as 24x7x365 journey-focused operations.
  • United States

Languages

  • English
Notice for Users

This job was posted by one of our partners. You can view the original job source here.