Dieses Stellenangebot ist nicht mehr verfügbar
Über
We are seeking a highly skilled and experienced Principal Software Engineer focused on Agentic AI and DevOps. The ideal candidate will architect and deliver agentic microservices and platform capabilities, lead cloud-native DevOps at scale, and partner with organizational leaders to communicate strategy, status, and results. Deep hands-on expertise with Azure, Kubernetes, CI/CD, infrastructure as code, and LLM/agent frameworks (LangChain/LangSmith/OpenAI/LiteLLM) is essential. Experience with dataflow orchestration (Apache NiFi), enterprise integrations (ServiceNow/Snowflake/Power BI/SharePoint), and production-grade observability is highly desirable.
What You'll Do:
" Architect, build, and operate agentic AI services and microservices leveraging LangChain, LangSmith, OpenAI/Azure OpenAI, and LiteLLM; implement tool-use orchestration, evaluation, and guardrails.
" Design, build, and maintain CI/CD pipelines using Azure DevOps (ADO) YAML and GitHub Actions; enforce trunk-based workflows, quality gates, progressive delivery, and automated rollbacks.
" Stand up and manage Azure infrastructure (AKS, Service Bus, Event Hubs, Storage Accounts, Key Vault, Bastion); codify environments with Terraform; implement secure networking, secrets, and RBAC.
" Containerize and ship services with Docker/Buildah; operate Kubernetes with CNI networking and Linkerd service mesh; implement canary/blue-green strategies and autoscaling.
" Create and operate Apache NiFi dataflows; deploy and manage NiFi clusters on AKS with VM Scale Sets, enabling resilient, scalable ingestion and orchestration.
" Implement enterprise-grade observability and logging: ELK/EFK (Elasticsearch, Fluentd/Fluent Bit, Kibana), Prometheus metrics, Azure Dashboards, and KQL-based alerting.
" Engineer data and analytics integrations: Azure Databricks, PostgreSQL, Snowflake; operationalize Power BI, SharePoint, and Jupyter-based workflows.
" Build robust platform and app integrations: ServiceNow APIs, REST APIs, SMTP/IMAP/POP email automations; configure and manage NGINX/HAProxy load balancers.
" Lead incident response, root-cause analysis, and postmortems; continuously improve reliability, performance, security, and cost.
" Mentor teams, drive architectural runway, and communicate plans, trade-offs, and outcomes to stakeholders and leadership.
Key Qualifications / Experience Required:
DevOps Experience
" Expert-level hands-on DevOps across Azure and Kubernetes: CI/CD, Git workflows, infrastructure as code, automated testing, monitoring, and secure deployment.
" Proficiency with Azure DevOps (ADO) YAML pipelines and GitHub Actions; experience optimizing pipelines for cloud-native systems.
" Strong Kubernetes operations including CNI networking and service mesh (Linkerd); container build and supply chain (Docker, Buildah).
" Observability at scale using ELK/EFK, Prometheus, Fluentd/Fluent Bit, Azure Monitor dashboards and alerting (KQL).
Automation Skills
" Deep automation with PowerShell, Bash, and Python to eliminate toil across build, release, environment, and operational workflows.
" Infrastructure as Code expertise with Terraform (Azure resources: AKS, Service Bus, Event Hubs, Storage, Key Vault, Bastion).
" Proven track record reducing manual intervention, increasing repeatability, and improving MTTR through automation.
Agentic AI Experience
" Practical, production experience delivering agentic AI solutions (task orchestration, tool-use, planning, retrieval, and evaluation).
" Hands-on with LangChain, LangSmith (tracing/eval), OpenAI/Azure OpenAI, and LiteLLM integration; familiarity with prompt engineering, safety/guardrails, and LLM observability (e.g., Arize).
" Experience operationalizing AI services within DevOps pipelines and platform governance.
Technical Proficiency
" Apache NiFi expertise: authoring and governing dataflows; deploying and scaling NiFi clusters on AKS with VM Scale Sets.
" Azure services: AKS, Service Bus, Event Hubs (setup and integration), Storage Accounts (setup and integration), Key Vault, Bastion, Azure Dashboards & Kusto Query Language (KQL).
" Data/analytics: Azure Databricks, PostgreSQL, Snowflake; Power BI and SharePoint integrations; Jupyter Notebook workflows.
" Networking fundamentals: DHCP/DNS; load balancer configuration and operations (NGINX, HAProxy); Kubernetes ingress best practices.
" Messaging and email protocols: SMTP, IMAP/POP.
" Microservices and app frameworks: Python and Node.js microservices (REST APIs), Electron build and packaging.
Required Technical Skills
" Windows PowerShell; Linux/Unix administration; Bash and Python.
" Azure Cloud (architecture, security, cost, RBAC); Azure DevOps (ADO) with YAML; GitHub Actions.
" Docker and Buildah; Kubernetes (CNI), Linkerd; ELK/EFK, Prometheus, Fluentd/Fluent Bit.
" Apache NiFi flow development and clustered operations on Kubernetes with scale sets.
" Azure Databricks; PostgreSQL; Snowflake; REST APIs; ServiceNow APIs; Power BI; SharePoint.
" Azure Service Bus, Azure Event Hubs, Storage Accounts, Key Vault, Bastion.
" Jira; Jupyter Notebook; Azure Dashboards and KQL; SMTP/IMAP/POP.
" Python and Node.js microservice architecture; Electron build.
Project Management Skills
" Plan, schedule, and coordinate multi-team deliveries and releases; manage dependencies, risks, and change.
" Drive execution across platform, app, data, and AI workstreams with clear milestones and success criteria.
" Establish SLOs/SLAs and error budgets; align roadmaps to business priorities.
Communication and Interpersonal Skills
" Communicate architectural decisions, roadmaps, and trade-offs to technical and executive audiences.
" Lead cross-functional ceremonies; produce clear runbooks, architecture docs, and dashboards.
" Foster collaboration across engineering, product, security, and operations.
Analytical and Problem-Solving Abilities
" Rapid diagnosis and resolution of complex production issues; strong RCA and remediation planning.
" Attention to detail in reliability, security, performance, and cost optimization.
Adaptability and Continuous Learning
" Track and adopt evolving best practices in cloud, containers, DevOps, and agentic AI.
" Champion continuous improvement in engineering excellence and platform governance.
Experience and Education
" Typically requires 10 15+ years in software engineering, DevOps/SRE, or platform engineering with principal-level impact.
" Bachelor's degree in Computer Science, Information Technology, or related field preferred (or equivalent experience).
Secondary Skills and Experience (Desired):
Design and Development
" Define and design subsystems and interfaces; allocate responsibilities across services and platforms.
" Translate non-functional requirements (security, reliability, scalability) into concrete designs.
Technical Enablement
" Provide technical enablement for components and subsystems; drive critical design decisions and reviews.
" Establish patterns and reusable templates for CI/CD, IaC, and agentic service scaffolding.
Continuous Delivery Pipeline
" Plan, define, and implement the continuous delivery pipeline with quality gates, progressive delivery, and rollback strategies.
Architectural Runway
" Develop the architectural runway to support new features and capabilities; align with Solution and Enterprise Architects and portfolio stakeholders.
Integration
" Architect and implement integrations with external components, systems, and platforms (ServiceNow, Snowflake, Power BI, SharePoint, email systems, and enterprise identity/secrets).
Top Skills:
" Windows PowerShell; Linux/Unix administration; Bash and Python
" Azure Cloud (architecture, security, cost, RBAC); Azure DevOps (ADO) with YAML; GitHub Actions
" Docker and Buildah; Kubernetes (CNI), Linkerd; ELK/EFK, Prometheus, Fluentd/Fluent Bit
Sprachkenntnisse
- English
Hinweis für Nutzer
Dieses Stellenangebot wurde von einem unserer Partner veröffentlicht. Sie können das Originalangebot einsehen hier.