Senior On Premise Platform EngineerLadybird • Manchester, England, United Kingdom

Dieses Stellenangebot ist nicht mehr verfügbar

Senior On Premise Platform Engineer

Ladybird

Manchester, England, United Kingdom

Manchester, England, United Kingdom

Ähnliche Jobs finden

Über

Role Summary
We are seeking a *Principal Platform Engineer with 7+ years of hands-on experience* designing, building, and operating *on-premise, bare-metal platforms* in *leased data-centre environments*.
You will lead the *end-to-end build* of a *mission-critical NHS 999 telecommunications platform*, supporting *real-time AI inference*, *streaming workloads*, and *telecoms-grade availability*.
This role requires *direct ownership* of physical infrastructure, private cloud platforms, and self-managed Kubernetes.
There is *no reliance on managed cloud services* (no EKS / AKS / GKE).
Key Responsibilities
On-Prem & Leased Data-Centre Platform Ownership
* Design and build *on-premise infrastructure* hosted in *leased UK data centres*
* Architect and operate *bare-metal Kubernetes clusters* (control plane + workers)
* Own *compute, networking, storage, Linux OS, and platform architecture*
* Design platforms capable of *99.99% availability*
* Plan and execute *capacity management, failover, and disaster recovery*
* Operate *GPU-enabled infrastructure* for AI inference and training
* Build systems suitable for *NHS 999 and emergency communications workloads*
Kubernetes, CI/CD & Automation (GitLab)
* Design and maintain *GitLab CI/CD pipelines* (build, test, deploy)
* Automate:
* Infrastructure provisioning (Terraform / IaC)
* Kubernetes deployments
* AI model and application releases
* Implement *GitOps workflows*
* Own *day-2 operations*, including upgrades, patching, and rollbacks
* Minimise deployment risk in *safety-critical environments*
Real-Time Streaming & Telecoms Systems
* Build and operate *Kafka-based streaming platforms*
* Support *sub-second latency* event processing
* Design for *traffic spikes, back-pressure, and failure scenarios*
* Ensure predictable behaviour under *999 call surges*
* Optimise systems for *latency, throughput, and resilience*
MLOps & AI Platform Infrastructure
* Operate *production AI inference platforms* (KServe, Seldon, Triton, or similar)
* Enable *GPU scheduling, isolation, and concurrency controls*
* Support *model versioning, retraining pipelines, and lifecycle management*
* Implement:
* Canary releases
* Versioned deployments
* Safe rollback paths
* Work closely with *AI engineers*, retaining platform ownership
Reliability, Security & NHS Compliance
* Build observability using *Prometheus, Grafana, and centralised logging*
* Define and monitor *SLIs, SLOs, latency, uptime, and error budgets*
* Lead *incident response and root-cause analysis*
* Implement *least-privilege access*, secrets management, and audit controls
* Harden platforms for *NHS, telecoms, and regulated environments*
Required Experience (Non-Negotiable)
Candidates *must meet most of the following*:
* *7+ years* hands-on platform / infrastructure engineering experience
* Proven experience building *on-prem or private-cloud platforms*
* Experience operating *leased data-centre infrastructure*
* *Bare-metal Kubernetes* (self-managed, not EKS / AKS / GKE)
* Strong Linux, networking, and storage fundamentals
* *GitLab CI/CD* pipeline design and ownership
* Experience with *telecommunications or NHS environments*
* Ownership of *production systems with strict uptime requirements*
Strongly Preferred
* NHS 999, emergency services, or healthcare platforms
* Telecommunications background (BT, Vodafone, carrier networks)
* Kafka and real-time streaming in production
* GPU-based AI inference workloads
* Terraform and Infrastructure as Code
* Experience in *regulated, mission-critical environments*
What This Role Is *Not*
* Cloud-only DevOps
* Data science or ML research
* Junior or mid-level engineering
* Platform consumption or inherited systems
This is a *hands-on, build-from-scratch platform ownership role*.
Working Environment
* *Location:* Manchester M35 9BD (On-site required)
* *Infrastructure:* Leased Data Centres · Bare Metal · Private Cloud
* *Stack:* Kubernetes · GitLab · Kafka · GPU · Linux
Final Screening Statement
If you have *not personally built and operated on-premise platforms* in *leased data centres*, this role will not be suitable.
If you have *7+ years of experience delivering telecoms-grade or NHS-grade platforms*, and are comfortable owning systems *where failure is not an option*, we want to hear from you.
Job Types: Part-time, Permanent, Temporary, Fixed term contract, Temp to perm, Zero hours contract, Volunteer, Internship
Contract length: 12-18 months
Pay: £48,973.65-£100,679.17 per year
Expected hours: 10 – 20 per week
Benefits:
* Casual dress
* Discounted or free food
* Flexitime
* Free parking
* On-site parking
* Referral programme
* UK visa sponsorship
* Work from home
Ability to commute/relocate:
* Manchester M35 9BG: reliably commute or plan to relocate before starting work (required)
Experience:
* hands-on platform engineering, including building on-prem: 7 years (required)
Work Location: Hybrid remote in Manchester M35 9BG

Manchester, England, United Kingdom

Sprachkenntnisse

English

Hinweis für Nutzer

Dieses Stellenangebot wurde von einem unserer Partner veröffentlicht. Sie können das Originalangebot einsehen hier.

Ähnliche Jobs finden