Site Reliability Engineer

66degrees

Vancouver, British Columbia, Canada

Vancouver, British Columbia, Canada

Apply Now

About

Overview of Role

66degrees' Managed Cloud Optimization (MCO) team works with some of the largest cloud users in the world to help them transform their businesses with technology. Our Site Reliability Engineers (SREs) combine Google Cloud Platform expertise with a passion for devops methodologies to help our clients maintain, optimize, and scale their cloud implementations.

On a daily basis, our SREs work with varied and exciting customers on topics ranging from solving critical outages to designing and deploying new cloud workloads to building self-healing automation. Our SREs work with cutting-edge Google Cloud technologies like Google Kubernetes Engine (GKE), Anthos, BigQuery and data pipelines, as well as leading 3rd party tools like Prometheus, Datadog, and many others. Our SREs also work with languages like Python and Terraform to create automation, deploy infrastructure, and contribute to open-sourcing.

If you're looking to continually build and apply your Google Cloud expertise to new and varied environments while acting as a key contributor to building the best Google consulting partner in the industry – let's talk.

Note: Pacific and Mountain Time Zones preferred; This role has a weekend on-call rotation

Responsibilities

Ensuring near-zero downtime with monitoring and alerting, self-healing automation, and continuous improvement
Create highly automated, available and scalable systems by applying software and infrastructure principles
Employ and advise clients on DevOps and SRE principles and practices, covering deployment pipelines, HA, service reliability, technical debt, and operational toil for live services running at scale
Provide a proactive approach to our clients' workloads, anticipating failures, automating tasks, ensuring availability, and providing a great customer experience
Work closely with clients, your team, and Google engineers to investigate and resolve infrastructure issues
Manage a Jira queue of inbound requests for numerous clients while effectively balancing and prioritizing projects
Contribute to ad-hoc initiatives such as writing documentation, open-sourcing, and improving operation, making a huge impact at a rapid-growth Google Premier Partner

Qualifications

Minimum 4 years of cloud and infrastructure experience, including demonstrated expertise with Linux, Windows, k8s, databases, and networking services
2 years of full-time Google Cloud experience preferred
Proficiency with Python required. Other programming language experience is a plus
Strong provisioning and configuration skills using Terraform
Experience in troubleshooting that spans systems, network, and code
Microsoft Server and SQL Server experience is a plus but not required
Experience with 24x7x365 monitoring, incident response, and on-call support preferred
Experience determining & negotiating Error budgets, SLIs, SLOs, and SLAs with product owners
Demonstrate the ability to work independently and as a member of a greater team, including cross-team activities
Experience working in Agile Scrum, Kanban methodologies in SDLC
Proven experience balancing service reliability, metrics, sustainability, technical debt, and operational toil for live services running at scale
Strong communication skills, as this is a heavily customer-facing role
A Bachelor's degree in Computer Science, Computer Engineering, or related or equivalent work experience required.

Vancouver, British Columbia, Canada

Languages

English

Notice for Users

This job comes from a TieTalent partner platform. Click "Apply Now" to submit your application directly on their site.

Apply Now