XX
Site Reliability Engineer(local to Cincinnati, OH)AKAASA Technologies Inc.Cincinnati, Ohio, United States

Dieses Stellenangebot ist nicht mehr verfügbar

XX

Site Reliability Engineer(local to Cincinnati, OH)

AKAASA Technologies Inc.
  • US
    Cincinnati, Ohio, United States
  • US
    Cincinnati, Ohio, United States

Über

Requirements

· years of experience in Cloud SRE, DevOps, Infrastructure, or related engineering roles

· years working with databases, web applications, microservices, event-driven systems, messaging platforms, REST APIs, integrations, and containerized environments

·      Strong knowledge of Java, Spring Boot, microservices architecture, Kafka, Cassandra, and SQL Server

·      Proficiency in Python and Shell scripting for automation and operational tooling

· year managing observability platforms such as Dynatrace, ELK, PagerDuty, Datadog, Azure Monitor, or Grafana

·      Hands-on experience with GitHub Actions for CI/CD automation

·      Strong foundation in Linux architecture, security, performance tuning, troubleshooting, and production operations

·      Experience working in Agile delivery teams

·      Ability to collaborate effectively with multi-location/global teams

·      Demonstrated ability to contribute at both a tactical and strategic level

·      Familiarity with eCommerce, fulfillment, or retail technology environments

·      Strong written, verbal, and presentation skills

Nice to Have

· years of experience designing or supporting high-volume eCommerce applications

· years configuring and managing cloud environments in Azure, AWS, or GCP

·      Hands-on experience with Kafka, Cosmos DB, Cassandra, Ansible, Terraform, Docker, Kubernetes (1+ year)

·      Experience with Nginx, HAProxy, or Squid

·      Experience building CI/CD pipelines with Jenkins, Spinnaker, Azure DevOps, or TeamCity

·      Experience implementing and managing RoyalTS or similar cross-platform remote management tools

Responsibilities

·      Partner with application engineering, observability, and support teams — along with business operations and third-party partners — to identify, prioritize, and resolve issues impacting customer pickup and delivery operations

·      Lead root cause analysis of critical business and production incidents, ensuring corrective actions and preventive measures are implemented

·      Manage and facilitate Major Incident calls for the Pickup Fulfillment domain, providing timely and accurate updates to key stakeholders during service restoration

·      Collaborate with engineering teams to continuously enhance build environments for improved reliability, speed, and scalability

·      Drive automation initiatives to improve system efficiency, deployment accuracy, and operational quality

·      Ensure system traceability, observability, and retrievability to support issue diagnosis and performance monitoring

·      Build and maintain comprehensive logging, monitoring, and alerting systems to proactively identify bottlenecks and enable performance optimization across cloud, on-prem, and in-store environments

·      Develop detailed documentation, playbooks, and design guides to support operational readiness and incident response consistency

  • ·      Participate in off-hours on-call rotation and scheduled maintenance windows to ensure uninterrupted system availability
  • Cincinnati, Ohio, United States

Sprachkenntnisse

  • English
Hinweis für Nutzer

Dieses Stellenangebot wurde von einem unserer Partner veröffentlicht. Sie können das Originalangebot einsehen hier.