Senior Telecommunications Platform Engineer

Ladybird

United Kingdom

United Kingdom

Über

Role Summary
This role is for an engineer who has built and operated NHS emergency or healthcare telephony systems — including 999, 111, GP surgery telephony, or crisis contact centres.
You will own an on-prem, telecoms-grade platform where voice ingress, call routing, surge handling, and reliability are mission-critical.
You will own the design, build, and operation of a mission-critical NHS emergency communications platform, comparable in scope and responsibility to:
* NHS 999 / 111 call handling platforms
* GP surgery telephony replacements (e.g. Surgery Connect–class systems)
* Crisis, triage, or high-volume healthcare contact centres
This is not a generic infrastructure role.
You must be comfortable owning:
* Telephony ingress
* Real-time call handling
* Routing, queuing, failover
* Integration with clinical triage and decision systems
* Telecoms-grade availability expectations
The platform runs on-premise in leased UK data centres, with no managed cloud abstractions.
What You Will Own (End-to-End)Emergency & Healthcare Telephony Platforms
You will:
* Design and operate real-time voice and telephony platforms used in healthcare or emergency contexts
* Build systems capable of handling surge traffic, call spikes, and crisis scenarios
* Design call routing, queuing, failover, and degradation behaviour
* Ensure predictable behaviour under load — especially during emergency events
* Own the platform as a clinical access system, not just infrastructure
This role assumes prior exposure to healthcare or emergency call flows, not just VoIP theory.
On-Prem Infrastructure & Platform Ownership
You will:
* Design and operate infrastructure in leased UK data centres
* Own:
* Compute
* Networking
* Storage
* Linux OS
* Virtualisation / container platforms
* Architect bare-metal Kubernetes (self-managed control planes and workers)
* Design for telecoms-grade availability (99.99%)
* Plan and execute:
* Capacity modelling
* Geographic failover
* Disaster recovery
* Operate GPU-enabled infrastructure for real-time AI workloads
Real-Time Streaming, Events & Call Intelligence
You will:
* Build and operate Kafka-based streaming platforms
* Support ordered, stateful, low-latency event processing
* Handle real-time call events, metadata, and routing decisions
* Design systems that degrade safely, never unpredictably
* Optimise for:
* Latency
* Throughput
* Back-pressure
* Reliability under stress
CI/CD, Automation & GitOps (GitLab-First)
You will:
* Design and own GitLab CI/CD pipelines
* Automate:
* Infrastructure provisioning (Terraform / IaC)
* Platform deployments
* Application and model releases
* Implement GitOps workflows
* Own day-2 operations:
* Upgrades
* Patching
* Rollbacks
* Reduce risk in safety-critical environments
AI-Enabled Triage & Decision Infrastructure
You will:
* Operate AI inference platforms used in triage and call decisioning
* Enable GPU scheduling, isolation, and concurrency controls
* Support:
* Model versioning
* Safe rollout strategies
* Rollback paths
* Work alongside AI engineers while retaining platform ownership
* Ensure AI systems behave predictably and safely in live call flows
Reliability, Security & NHS-Grade Compliance
You will:
* Build observability using Prometheus, Grafana, and centralised logging
* Define and monitor:
* SLIs / SLOs
* Latency
* Call success rates
* Platform availability
* Lead incident response and root-cause analysis
* Implement:
* Least-privilege access
* Secrets management
* Audit logging
* Harden systems for NHS, telecoms, and regulated environments
Required Experience (Strict)
You must meet most of the following:
* 5+ years hands-on engineering experience
* Direct experience building or operating:
* NHS 999 / 111 systems
* Emergency services telephony
* Healthcare contact-centre platforms
* GP surgery telephony systems
* Experience with real-time call handling systems
* Proven on-prem or private-cloud platform ownership
* Bare-metal Kubernetes (no EKS / AKS / GKE)
* Strong Linux, networking, and systems fundamentals
* Ownership of platforms where downtime is unacceptable
Strongly Preferred
* Direct NHS emergency services experience
* Background in telecoms providers or carrier-grade networks
* Experience replacing or modernising legacy healthcare telephony
* Kafka and streaming platforms in production
* GPU-backed inference systems
* Experience in regulated, safety-critical environments
What This Role Is Not
* Cloud-only DevOps
* Generic Kubernetes engineering
* Data science or ML research
* Maintaining inherited systems
* Junior or mid-level roles
This is a hands-on ownership role for engineers who have already built systems where failure carries real-world consequences.
Job Types: Part-time, Permanent, Temporary, Fixed term contract, Temp to perm, Zero hours contract, Volunteer, Internship
Contract length: 12-18 months
Pay: £21,310.10-£48,973.65 per year
Benefits:
* Casual dress
* Discounted or free food
* Flexitime
* Free parking
* On-site parking
* Referral programme
* UK visa sponsorship
* Work from home
Experience:
* hands-on platform engineering, including building on-prem: 5 years (required)
Work Location: Remote

United Kingdom

Sprachkenntnisse

English

Hinweis für Nutzer

Dieses Stellenangebot wurde von einem unserer Partner veröffentlicht. Sie können das Originalangebot einsehen hier.

Ähnliche Jobs finden