Site Reliability Engineer

Sangha Partners

United States

United States

Ähnliche Jobs finden

Über

Overview:

This role supports a new spin-out built from the success of a patent-protected rewards platform originally developed inside a high-growth geolocation app. The technology allows users to earn real-time rewards using their Visa or Mastercard at brick-and-mortar merchants, and it's proven valuable far beyond gaming.

The team is now building a standalone loyalty infrastructure that any app or developer can plug into-bringing real-time cashback and rewards to millions of users across different experiences.

This is a small, senior engineering group that ships quickly, moves intentionally, and prioritizes automation and operational excellence. The platform is preparing for its first major production launch at the end of Q1, which is why they're hiring their first dedicated SRE to own reliability from day one.

The Role

We're looking for a Senior Site Reliability Engineer who can build, automate, and own the systems that keeps our platform healthy, observable, and ready for scale. This is an early, high-impact role where you'll set the foundation for reliability as we move into production for the first time. If you love automation, hate manual work, and treat observability dashboards like living art-you'll thrive here.

What You'll Do

Own observability end-to-end — everything goes through Datadog (logs, traces, metrics, alerts, dashboards).
Build a thoughtful alerting strategy that actually signals problems (not noise).
Create runbooks and incident workflows so the team knows exactly what to do when something breaks.
Partner with backend engineering ) to ensure the platform is reliable, scalable, and instrumented correctly.
Automate everything humanly possible- deployments, remediation, checks, config, integration points.
Champion SLOs, SLIs, and error budgets and help educate the team on reliability best practices.
Prep us for public production launch in late Q1 by ensuring we have a rock-solid monitoring and ops foundation.
Improve developer experience across the SDLC: CI/CD pipelines, automated testing, environments, release processes.

What You Bring

5–8+ years in SRE, DevOps, or Production Engineering roles.
Deep experience with Datadog (must): logs, traces, dashboards, monitors, alerting, APM.
Strong background with ecosystems and event-driven architectures.
Experience with relational DBs (Postgres, SQL Server, MySQL) and optimizing their reliability.
Prior work in payments, fintech, loyalty systems, or high-integrity transactional systems is a huge plus.
Pro at designing and implementing automation and self-healing systems.
Strong understanding of the full SDLC, deployment pipelines, and production readiness.
Calm under fire- you're the person who makes sure things don't catch on fire.

Why Apply

Build the reliability foundation for a product launching into real production for the first time.
Join a tight, senior team where you'll have direct input on architecture, tooling, and operational standards.
Zero bureaucracy. High trust. Fast shipping environment.
A chance to shape a next-generation loyalty platform that plugs into millions of daily transactions.

United States

Sprachkenntnisse

English

Hinweis für Nutzer

Dieses Stellenangebot wurde von einem unserer Partner veröffentlicht. Sie können das Originalangebot einsehen hier.

Ähnliche Jobs finden