Zurück zur Stellenangebote
XX
Software Engineering Manager Production Support OperationsSunTrust Investment Services, Inc.United States
XX

Software Engineering Manager Production Support Operations

SunTrust Investment Services, Inc.
  • US
    United States
  • US
    United States

Über

Manager of Production Support
The Manager of Production Support leads teams responsible for ensuring the stability, resilience, and operational excellence of critical technology platforms supporting core lines of business. This role owns end-to-end production support operations while driving maturity toward engineering-first, site reliability–focused practices. The Director identifies and resolves complex technical, operational, risk, and organizational challenges, while building high-performing, accountable teams across onshore and offshore locations. This position carries full people management responsibility, including hiring, coaching, performance management, and disciplinary actions, and serves as a key partner to Technology, Risk, and Business leadership. Essential duties and responsibilities following is a summary of the essential functions for this job. Other duties may be performed, both major and minor, which are not mentioned below. Specific activities may change from time to time. Production Support Leadership & Accountability Own end-to-end production support operations for multiple mission-critical applications supporting key lines of business, ensuring availability, stability, and performance meet defined SLAs and SLOs. Provide accountable, visible leadership for 24x7 operational support, including on-call models, escalation paths, and incident response effectiveness. Act as the senior escalation point for major incidents, ensuring swift recovery, accurate root cause analysis, and durable remediation. Incident & Problem Management Lead cross-functional incident recovery efforts in partnership with Incident Management, engineering teams, infrastructure, and business stakeholders. Ensure timely root cause analysis (RCA), post-incident reviews, and corrective actions that prevent recurrence. Establish and mature a production knowledge base, documenting known issues, recovery procedures, and architectural insights. Engineering-First & SRE Practices Drive adoption of Site Reliability Engineering (SRE) and lean engineering principles, including: Reduction of toil through automation Engineering-based reliability metrics (error budgets, SLIs/SLOs) Proactive resilience and failure prevention practices Champion automation of repetitive and manual operational tasks, including incident detection, response, validation, and recovery where feasible. Promote a culture of preventative engineering, partnering with development teams to improve system reliability upstream. Monitoring, Observability & AI Enablement Implement and continuously improve real-time monitoring, alerting, and observability across applications and infrastructure. Measure and optimize the effectiveness of monitoring and alerting to eliminate noise and accelerate mean-time-to-detect and mean-time-to-recover. Leverage AI and advanced analytics to correlate telemetry data (logs, metrics, traces) and proactively identify emerging risks and root causes. Champion the safe and responsible use of AI within production operations by adhering to enterprise guardrails and protecting sensitive data and system integrity. Operational Readiness & Change Enablement Oversee operational readiness across releases, disaster recovery and failover testing and certificate and dependency lifecycle management. Ensure production support is actively embedded in change planning, minimizing risk from releases and infrastructure changes. People, Vendor & Financial Management Lead one or more Agile teams (Scrum, Kanban), including onshore and offshore engineers, fostering high performance and accountability. Manage workforce vendors and partners, setting expectations, reviewing performance, and ensuring delivery quality. Own budget and staffing plan aligned to application criticality, operational risk, and business growth objectives. Risk Management & Governance Act as the first line of defense in production operations by proactively identifying and mitigating technology, operational, and resiliency risks. Partner effectively with second-line Risk, Audit, and Regulatory teams, ensuring findings are addressed and controls are continuously improved. Ensure compliance with internal policies, regulatory requirements, and external audit expectations. Own and drive remediation plans for risk, audit, and regulatory findings, ensuring timely, effective and sustainable resolution. Lead responses to audit and regulatory inquiries, including providing evidence, clarifying controls, and appropriately challenging findings based on documented compliance. Strategy, Influence & Continuous Improvement Serve as a trusted advisor to senior Technology and Business leaders, communicating operational health, risk posture, and improvement roadmaps. Lead or contribute significantly to large-scale initiatives, platform transformations, or regulatory-driven efforts. Continuously assess organizational maturity and lead initiatives to improve reliability, efficiency, and talent capability. Management Responsibilities Full people management accountability, including: Hiring and succession planning
Coaching and performance management
Compensation input and talent development
Disciplinary action and terminations as necessary
Agile & Operating Model Expectations Act as an Agile and DevOps champion, embedding production support within fast-moving delivery models.
Balance "keep-the-lights-on" operational excellence with continuous engineering improvement.
Drive measurable outcomes such as improved uptime, reduced incident volume, faster recovery, and improved customer experience.
Qualifications Required Qualifications The requirements listed below are representative of the knowledge, skill and/or ability required. Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions. 1. Bachelor's degree in Computer Science, Software Engineering, or a related technical field, or equivalent practical experience. 2. A minimum of 5 years of professional software engineering experience, including team leadership or supervisory responsibilities. Preferred Qualifications 1. Understanding of multiple approaches to production support and software engineering delivery. 2. Full understanding of Agile methodology. 3. Experience leading teams in an Agile organization, particularly those practicing Site Reliability Engineering. 4. Experience using AI agents in day-to-day activities, particularly in regard to enabling software delivery and production support operations. 5. Banking or financial services experience. 6. Bachelor's degree and twelve years of experience in software development, production support, including five years of management experience
  • United States

Sprachkenntnisse

  • English
Hinweis für Nutzer

Dieses Stellenangebot stammt von einer Partnerplattform von TieTalent. Klick auf „Jetzt Bewerben”, um deine Bewerbung direkt auf deren Website einzureichen.