jobtraffic
Senior Linux Platform EngineerjobtrafficIreland

Cette offre d'emploi n'est plus disponible

jobtraffic

Senior Linux Platform Engineer

jobtraffic
  • IE
    Ireland
  • IE
    Ireland

À propos

Overview
All the relevant skills, qualifications and experience that a successful applicant will need are listed in the following description.

We are seeking a highly technical Senior Platform Engineer with deep expertise in Linux Engineering, OpenStack development, Kubernetes, and GPU-enabled infrastructure to design, build, and operate SIG’s next-generation infrastructure platforms supporting trading and core technology environments.


This is a hands‑on engineering role focused on building and tuning scalable, resilient, and high‑performance infrastructure systems across CPU and GPU workloads. The ideal candidate will have strong Linux internals knowledge, experience developing and operating cloud‑native platforms, and a deep understanding of distributed systems architecture, including the efficient provisioning, isolation, and performance tuning of accelerator‑based compute resources.


What We're Looking For
Linux Systems Engineering

  • Deep troubleshooting across kernel, networking stack, storage, and performance layers.
  • Performance tuning for low‑latency systems (CPU pinning, NUMA, IRQ balancing, kernel tuning).
  • Develop automation using Python, Go, or similar languages.
  • Build and maintain infrastructure tooling and internal platform services.
  • Implement high‑availability solutions and disaster recovery strategies.
  • Perform root cause analysis for production incidents affecting distributed systems.
  • Design, deploy, and operate GPU‑enabled infrastructure. Optimize GPU utilization (memory bandwidth, PCIe throughput, multi‑process service, MIG partitioning where applicable).
  • Tune workloads to efficiently leverage NVIDIA GPUs (or equivalent accelerators) for compute‑intensive applications.
  • Troubleshoot GPU driver, CUDA, kernel module, and firmware‑related issues in production environments.

OpenStack Development & Cloud Infrastructure

  • Develop and extend OpenStack services (Nova, Neutron, Cinder, Keystone, etc.).
  • Build custom integrations and automation around OpenStack APIs.
  • Optimize compute, networking, and storage performance for high‑performance workloads.
  • Design multi‑tenant OpenStack architectures with strong isolation and security.
  • Contribute to infrastructure‑as‑code frameworks managing OpenStack environments.
  • Debug and resolve deep issues across hypervisors (KVM), networking layers, and control plane services.
  • Integrate OpenStack environments with Kubernetes platforms (hybrid cloud architectures).

Kubernetes Platform Engineering

  • Design, build, and operate highly available, production‑grade Kubernetes clusters.
  • Develop and maintain Kubernetes operators, controllers, and custom resource definitions (CRDs).
  • Implement advanced scheduling, multi‑tenancy, and workload isolation strategies.
  • Optimize cluster performance for low‑latency and high‑throughput workloads.
  • Integrate Kubernetes with CI/CD pipelines and GitOps workflows.
  • Implement cluster observability using Prometheus, Grafana, OpenTelemetry, etc.
  • Design and enforce networking policies (CNI), ingress architecture.
  • Implement secure cluster design including RBAC, OPA/Gatekeeper, secrets management, and runtime security.

Automation & Infrastructure as Code

  • Design and maintain infrastructure using Terraform, Ansible, Helm, or similar tools.
  • Build CI/CD pipelines for infrastructure and platform deployments.
  • Implement immutable infrastructure and GitOps methodologies.
  • Create automated validation, testing, and deployment frameworks for platform services. xcfaprz

Required Technical Skills

  • Advanced Linux systems knowledge (kernel, networking, storage)
  • Experience deploying and operating GPU‑enabled Linux servers
  • Understanding of CUDA drivers, GPU kernel modules
  • Performance profiling and tuning workloads for compute‑intensive applications.
  • Hands‑on OpenStack development and operations experience
  • Strong experience administering and engineering production Kubernetes clusters
  • Strong understanding of distributed systems principles:

    • Consensus
    • Replication
    • Fault tolerance
    • CAP theorem tradeoffs

  • Experience with

    • Python or similar programming languages
    • Infrastructure as Code (Terraform, Ansible)
    • Container runtimes (containerd, CRI‑O)
    • Observability stacks (Prometheus, Grafana, ELK)


Desirable Experience

  • Experience in low‑latency or high‑performance trading environments
  • High‑performance networking (DPDK, SR‑IOV, CNI tuning)
  • Storage systems (Ceph, distributed storage, NVMe optimization)
  • Contribution to open‑source projects (Kubernetes, OpenStack)
  • Experience designing multi‑region or hybrid cloud architectures
  • Experience tuning AI/ML, quantitative, or high‑performance compute workloads on GPUs
  • Experience with NVIDIA DCGM, MIG (Multi‑Instance GPU), or vGPU configurations
  • Familiarity with RDMA, GPUDirect, or high‑throughput interconnects
  • Experience optimizing containerized ML or compute pipelines

Key Attributes

  • Strong systems thinking and deep technical curiosity
  • Ability to diagnose complex cross‑layer failures
  • Passion for building reliable, scalable distributed systems
  • Comfortable operating in high‑availability, high‑performance production environments
  • Strong documentation and knowledge‑sharing mindset

#J-18808-Ljbffr
  • Ireland

Compétences linguistiques

  • English
Avis aux utilisateurs

Cette offre a été publiée par l’un de nos partenaires. Vous pouvez consulter l’offre originale ici.