Lead Site Reliability Engineer, Observability (Remote, North America)
- Vancouver, British Columbia, Canada
- Vancouver, British Columbia, Canada
Über
Vivun delivers Ava, the AI Sales Teammate for high‑velocity sales teams. As Lead Observability Engineer, you’ll rebuild and own our observability strategy across both agentic systems and SaaS infrastructure, creating frameworks and tooling that enable teams to ship confidently, measure performance, and maintain reliability as we scale.
Base Pay$185,000 – $205,000 per year.
Position SummaryAs the Observability Lead, you’ll design and implement Vivun’s observability patterns spanning infrastructure, applications, and agentic workloads. You’ll work closely with engineering, QA, and product to establish unified visibility across the full stack, from LLM‑driven agents to backend services. You won’t just monitor systems—you’ll define the patterns and tools that are a core part of empowering and driving Vivun’s engineering culture.
Key Responsibilities- Own the end‑to‑end observability strategy for Ava, defining standards, tools, and patterns that ensure reliable visibility.
- Design and implement correlation models linking agent behavior, LLM interactions, and SaaS telemetry into actionable insights.
- Unify observability tooling across teams, ensuring metrics, logs, and traces flow into a central platform.
- Collaborate with engineering and QA to embed observability best practices into workflows, CI/CD, and quality gates.
- Establish enablement frameworks—documentation, dashboards, templates—that make observability self‑serve.
- Partner to align observability with infrastructure reliability, alerting, and incident response.
- Contribute to performance and reliability strategy, defining agent quality, responsiveness, and scalability metrics.
- 6+ years in SRE, DevOps, or Observability Engineering, with 2+ years leading observability initiatives.
- Deep knowledge of OpenTelemetry, Prometheus, Grafana, Datadog, Honeycomb, Observe, etc.
- Experience with Agentic/LLM‑based systems (LangChain, Celery, OpenAI APIs, orchestration frameworks).
- Strong understanding of instrumenting, tracing, and correlating AI/LLM workflows with infrastructure telemetry.
- Proven ability to define cross‑team standards, influence culture, and establish scalable monitoring patterns.
- Strong collaboration and communication skills—enable, not dictate.
- Experience building observability into hybrid SaaS plus agent architectures.
- Background in data pipelines or analytics observability.
- Familiarity with Python‑ or Node.js‑based SDKs.
- Prior experience scaling observability in a startup or rapid‑growth environment.
- A believer in Vivun’s core values: Set the Standard, Take Ownership, Stay Curious, Fast & Focused.
- Builder at heart: eager to build observability foundations for a next‑generation agentic platform.
- Innovative problem solver: ready to tackle cutting‑edge monitoring at the intersection of SaaS and AI.
- Collaborative: thrive in a high‑impact engineering culture that values enablement.
- Experienced in high‑growth startup environments; fast, adaptable, and goal‑driven.
- Competitive salary and full health benefits.
- Stock options at a well‑funded, pre‑IPO company on a fast growth track.
- Flexible work schedules; fully remote.
- Unlimited PTO with two weeks of quiet period each year.
- An experienced team that will fight beside you to achieve goals.
Mid‑Senior Level
Employment TypeFull‑time
Job FunctionEngineering and Information Technology
IndustriesTechnology, Information and Internet
#J-18808-LjbffrSprachkenntnisse
- English
Dieses Stellenangebot stammt von einer Partnerplattform von TieTalent. Klicken Sie auf „Jetzt Bewerben“, um Ihre Bewerbung direkt auf deren Website einzureichen.