Senior DevOps Engineer

SDI International

United States

United States

Apply Now

About

No H1 or C2C. Must be Permanent Resident or US Citizen
Be one of the first applicants, read the complete overview of the role below, then send your application for consideration.
Senior DevOps Platform Engineer Description and Requirements About Our Team We are building
Quantum , a next‑generation hybrid AI platform that spans Windows, Android, and cloud. As part of this vision, we are expanding the engineering organization supporting cross‑device Personal AI. We are hiring
Senior DevOps / Platform Engineers
to build and operate the core automation, infrastructure, and service platforms that enable secure, reliable, and high‑velocity delivery of AI systems across device, edge, and cloud. Depending on your background, you may be aligned to Platform Engineering, Observability, Operations, or Service Reliability—based on experience and organizational need. Operates with the
speed, ownership, and creativity of a startup , supported by the scale, resources, and technical depth. We are building foundational systems from the ground up—intentionally, pragmatically, and with a culture of engineering excellence. Location:
Open to remote work in the US. The preferred work location is Chicago, IL. What You Might Work On As a Senior DevOps / Platform Engineer, you may be responsible for a subset of the following areas depending on team placement: CI/CD, Automation & Tooling Designing, implementing, and improving
CI/CD pipelines
for AI, platform, and application teams. Building automation and developer tooling to improve productivity and consistency. Developing
infrastructure‑as‑code
for cloud and hybrid environments (Terraform, Bicep, etc.). Platform & Infrastructure Engineering Implementing scalable, secure, and resilient infrastructure on
Azure
and Kubernetes. Building and operating hybrid systems spanning
device, edge, and cloud compute . Enabling reliable platform services that support inference, data pipelines, and high‑performance AI workloads. Observability & Telemetry Implementing and enhancing observability systems using
OpenTelemetry ,
Grafana , Prometheus, Loki, and related tooling. Ensuring platform telemetry is accurate, actionable, and tied to performance and reliability outcomes. Building dashboards and analytics for service health and operational insight. Deployment & Release Engineering Improving deployment workflows, safety, consistency, and traceability. Supporting progressive delivery patterns including canaries, staged rollouts, and automated rollbacks. Optimizing CI/CD and deployment tooling for hybrid AI services. Collaboration & Reliability Culture Partnering closely with SRE, AI/ML, security, firmware, and product engineering teams. Contributing to system design discussions with a focus on automation, scalability, and operational best practices. Helping define and evolve platform engineering standards, patterns, and conventions. Basic Qualifications 10+ years
in DevOps, Platform Engineering, Cloud Engineering, or related fields Bachelor’s Degree in Computer Science, Engineering, or a related technical field Strong experience building and operating infrastructure in Azure, AWS, or GCP Proficiency with
CI/CD systems , build automation, and deployment pipelines Experience with
Infrastructure as Code
(Terraform, ARM/Bicep, CloudFormation, etc.) Strong development or scripting skills (Python, Go, Bash, or similar) Hands-on experience with
Docker
and
Kubernetes Understanding of observability fundamentals (metrics, logs, tracing) Preferred Qualifications Deep experience with
Azure
cloud architecture and DevOps tooling Strong hands‑on work with
OpenTelemetry
(instrumentation, pipelines) Experience with
Grafana , Prometheus, Loki, Tempo, or similar observability tools Experience supporting
AI/ML workloads
or GPU‑accelerated compute environments Familiarity with event‑driven systems and operationalizing data pipelines Experience contributing to or running on‑call rotations Passion for automation, developer experience, and infrastructure reliability at scale What Success Looks Like CI/CD pipelines are fast, stable, and trusted. Platform infrastructure becomes more automated, observable, and scalable. Telemetry and dashboards provide clear visibility into system health. Deployments are consistent, safe, and repeatable. Engineering teams move faster thanks to strong platform foundations. xywuqvp Hybrid AI platform becomes increasingly reliable, efficient, and easy to operate.

United States

Languages

English

Notice for Users

This job comes from a TieTalent partner platform. Click "Apply Now" to submit your application directly on their site.

Apply Now