XX
Senior On Premise Platform EngineerLadybirdManchester, England, United Kingdom

Dieses Stellenangebot ist nicht mehr verfügbar

XX

Senior On Premise Platform Engineer

Ladybird
  • GB
    Manchester, England, United Kingdom
  • GB
    Manchester, England, United Kingdom

Über

Role Summary
We are seeking a *Principal Platform Engineer with 7+ years of hands-on experience* designing, building, and operating *on-premise, bare-metal platforms* in *leased data-centre environments*.
You will lead the *end-to-end build* of a *mission-critical NHS 999 telecommunications platform*, supporting *real-time AI inference*, *streaming workloads*, and *telecoms-grade availability*.
This role requires *direct ownership* of physical infrastructure, private cloud platforms, and self-managed Kubernetes.
There is *no reliance on managed cloud services* (no EKS / AKS / GKE).
Key Responsibilities
On-Prem & Leased Data-Centre Platform Ownership
* Design and build *on-premise infrastructure* hosted in *leased UK data centres*
* Architect and operate *bare-metal Kubernetes clusters* (control plane + workers)
* Own *compute, networking, storage, Linux OS, and platform architecture*
* Design platforms capable of *99.99% availability*
* Plan and execute *capacity management, failover, and disaster recovery*
* Operate *GPU-enabled infrastructure* for AI inference and training
* Build systems suitable for *NHS 999 and emergency communications workloads*
Kubernetes, CI/CD & Automation (GitLab)
* Design and maintain *GitLab CI/CD pipelines* (build, test, deploy)
* Automate:
* Infrastructure provisioning (Terraform / IaC)
* Kubernetes deployments
* AI model and application releases
* Implement *GitOps workflows*
* Own *day-2 operations*, including upgrades, patching, and rollbacks
* Minimise deployment risk in *safety-critical environments*
Real-Time Streaming & Telecoms Systems
* Build and operate *Kafka-based streaming platforms*
* Support *sub-second latency* event processing
* Design for *traffic spikes, back-pressure, and failure scenarios*
* Ensure predictable behaviour under *999 call surges*
* Optimise systems for *latency, throughput, and resilience*
MLOps & AI Platform Infrastructure
* Operate *production AI inference platforms* (KServe, Seldon, Triton, or similar)
* Enable *GPU scheduling, isolation, and concurrency controls*
* Support *model versioning, retraining pipelines, and lifecycle management*
* Implement:
* Canary releases
* Versioned deployments
* Safe rollback paths
* Work closely with *AI engineers*, retaining platform ownership
Reliability, Security & NHS Compliance
* Build observability using *Prometheus, Grafana, and centralised logging*
* Define and monitor *SLIs, SLOs, latency, uptime, and error budgets*
* Lead *incident response and root-cause analysis*
* Implement *least-privilege access*, secrets management, and audit controls
* Harden platforms for *NHS, telecoms, and regulated environments*
Required Experience (Non-Negotiable)
Candidates *must meet most of the following*:
* *7+ years* hands-on platform / infrastructure engineering experience
* Proven experience building *on-prem or private-cloud platforms*
* Experience operating *leased data-centre infrastructure*
* *Bare-metal Kubernetes* (self-managed, not EKS / AKS / GKE)
* Strong Linux, networking, and storage fundamentals
* *GitLab CI/CD* pipeline design and ownership
* Experience with *telecommunications or NHS environments*
* Ownership of *production systems with strict uptime requirements*
Strongly Preferred
* NHS 999, emergency services, or healthcare platforms
* Telecommunications background (BT, Vodafone, carrier networks)
* Kafka and real-time streaming in production
* GPU-based AI inference workloads
* Terraform and Infrastructure as Code
* Experience in *regulated, mission-critical environments*
What This Role Is *Not*
* Cloud-only DevOps
* Data science or ML research
* Junior or mid-level engineering
* Platform consumption or inherited systems
This is a *hands-on, build-from-scratch platform ownership role*.
Working Environment
* *Location:* Manchester M35 9BD (On-site required)
* *Infrastructure:* Leased Data Centres · Bare Metal · Private Cloud
* *Stack:* Kubernetes · GitLab · Kafka · GPU · Linux
Final Screening Statement
If you have *not personally built and operated on-premise platforms* in *leased data centres*, this role will not be suitable.
If you have *7+ years of experience delivering telecoms-grade or NHS-grade platforms*, and are comfortable owning systems *where failure is not an option*, we want to hear from you.
Job Types: Part-time, Permanent, Temporary, Fixed term contract, Temp to perm, Zero hours contract, Volunteer, Internship
Contract length: 12-18 months
Pay: £48,973.65-£100,679.17 per year
Expected hours: 10 – 20 per week
Benefits:
* Casual dress
* Discounted or free food
* Flexitime
* Free parking
* On-site parking
* Referral programme
* UK visa sponsorship
* Work from home
Ability to commute/relocate:
* Manchester M35 9BG: reliably commute or plan to relocate before starting work (required)
Experience:
* hands-on platform engineering, including building on-prem: 7 years (required)
Work Location: Hybrid remote in Manchester M35 9BG
  • Manchester, England, United Kingdom

Sprachkenntnisse

  • English
Hinweis für Nutzer

Dieses Stellenangebot wurde von einem unserer Partner veröffentlicht. Sie können das Originalangebot einsehen hier.