Staff Site Reliability & DevOps Engineer

Brandwatch

United States

United States

Find similar jobs

About

Staff Site Reliability & DevOps Engineer - Observability At Cision, we believe in empowering every individual to make an impact. Here, your voice is heard, your ideas are valued, and your unique perspective fuels our collective success. As part of our global team, you'll thrive in an environment that champions curiosity, collaboration, and innovation, all while making meaningful contributions to the brands we accelerate.
Join us in shaping the future of communication and building authentic connections that matter. Whether you're solving complex problems or driving bold innovations, your growth is our success, and together, we’ll create the conversations of tomorrow.
Empower your impact at Cision. Be seen, be understood, be you.
This role focuses on designing, operating, and evolving observability platforms with a strong emphasis on metrics, logging, and alerting. The primary tooling is Grafana and Prometheus, with responsibility for ensuring production systems are observable, reliable, and operable at scale. The role works closely with platform, infrastructure, and application teams.
Key responsibilities
Design, build, and operate observability platforms based on Grafana and Prometheus
Define and maintain metrics standards, dashboards, alerts, and SLOs
Improve signal quality: reduce alert noise, tune thresholds, and improve runbooks
Support incident response by providing actionable telemetry and post-incident analysis
Integrate metrics, logs, and traces across distributed systems
Work with engineering teams to instrument services correctly
Automate observability configuration using infrastructure as code
Contribute to reliability improvements through capacity planning and performance analysis
Required skills and experience
Strong experience with Prometheus (scraping, federation, recording rules, alerting)
Strong experience with Grafana (dashboards, alerting, templating, RBAC)
Solid Linux and networking fundamentals
Experience running observability stacks in Kubernetes environments
Infrastructure as code experience (Terraform preferred)
Familiarity with incident management and on-call practices
Ability to debug production systems using metrics and logs
Nice to have
Experience with logs and traces (e.g. Loki, Tempo, OpenTelemetry)
Experience operating large-scale or multi-cluster Kubernetes platforms
Experience with cloud platforms (GCP, AWS, OCI)
Exposure to SRE concepts such as error budgets and SLO-driven prioritisation
What success looks like
Engineers trust dashboards and alerts to reflect system health
Incidents are detected earlier and diagnosed faster
Alert fatigue is reduced and on-call quality improves
Observability is treated as a first-class platform capability
Cision is proud to be an equal opportunity employer, seeking to create a welcoming and diverse environment. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, gender identity or expression, sexual orientation, national origin, genetics, disability, age, veteran status, or other protected statuses.
Cision is committed to the full inclusion of all qualified individuals. In keeping with our commitment, Cision will take the steps to assure that people with disabilities are provided reasonable accommodations. Accordingly, if reasonable accommodation is required to fully participate in the job application or interview process, to perform the essential functions of the position, and/or to receive all other benefits and privileges of employment, please contact hr.support@cision.com
#J-18808-Ljbffr

United States

Languages

English

Notice for Users

This job was posted by one of our partners. You can view the original job source here.

Find similar jobs