Cette offre d'emploi n'est plus disponible
Infrastructure Manager
Air-tek
- Toronto, Ontario, Canada
- Toronto, Ontario, Canada
À propos
- Ensure the uptime, performance, and overall reliability of Air‑Tek's platform in alignment with established SLOs.
- Oversee the successful deployment of new code, services, and infrastructure into production environments.
- Analyze, tune, and optimize systems to operate at maximum efficiency, availability and resiliency.
- Create a strong focus and culture around observability. Embed monitoring and alerting best practices across the organization.
- Automation & Tooling to enable our engineering teams to deliver efficiently, by reducing manual toil
- Continuously improve CI/CD workflows and deployment processes alongside engineering partners.
- Lead and participate in the team's on‑call rotation, ensuring rapid response and resolution to critical incidents.
- Foster continuous improvement through blameless postmortems, root‑cause analysis, and reliability‑focused engineering initiatives.
- Partner with development teams to integrate reliability principles earlier into the software development lifecycle.
- Mentor and develop SRE team members, fostering a culture of learning, ownership, and engineering excellence.
- Contribute to the vision and roadmap of the SRE function, driving initiatives that promote scalability, automation, and resilience.
- Bachelor's degree in Computer Science, Software Engineering, or equivalent practical experience.
- Significant hands-on experience as a Site Reliability engineer or Platform Engineer, including 2+ years in a leadership or managerial role.
- Strong experience with production monitoring and logging tools such as AWS CloudWatch and Datadog.
- Hands‑on experience with: System Administration & Containers (Docker, Linux), Cloud Platforms (AWS), Databases (Mongo Atlas, PostgreSQL, AWS Aurora), CI/CD & Deployment (GitHub Actions, Argo CD), Infrastructure & Environment Management (Pulumi, Terraform, Kubernetes), Data Streaming & Messaging (Kafka, RabbitMQ).
- Proficiency in programming and scripting languages
- Strong analytical and problem‑solving skills with the ability to tackle complex technical challenges.
- Excellent written and verbal communication skills with the ability to clearly explain technical concepts and designs.
- Empathy, composure, and clarity in high‑pressure incident situations.
Compétences linguistiques
- English
Avis aux utilisateurs
Cette offre a été publiée par l’un de nos partenaires. Vous pouvez consulter l’offre originale ici.