Cette offre d'emploi n'est plus disponible
À propos
We are seeking a highly experienced Principal Platform Engineer to design, build, and operate secure, scalable, and highly reliable cloud platforms. This role sits at the intersection of platform engineering, site reliability engineering (SRE), and infrastructure security, supporting mission‑critical distributed systems including financial services and blockchain‑based platforms. You will lead the development of resilient multi‑cloud infrastructure, drive reliability and observability standards, and enable engineering teams through self‑service platforms, automation, and GitOps‑based delivery models.
Key Responsibilities:Platform Engineering & Architecture- Design and operate large‑scale, multi‑region infrastructure across AWS, GCP, and Azure
- Build and evolve Kubernetes platforms (EKS, AKS, GKE) for high‑availability production workloads
- Define platform standards, golden paths, and reusable infrastructure patterns
- Architect secure environments, including confidential computing and enclave‑based systems
- Perform deep troubleshooting across Linux kernel, networking stack, storage, and system performance layers
- Optimize systems for low‑latency and high‑throughput workloads (CPU pinning, NUMA awareness, IRQ tuning, disk I/O optimization)
- Diagnose and resolve complex production issues using system‑level tools (e.g., perf, eBPF, strace, tcpdump)
- Tune OS‑level parameters for containerized and distributed environments
- Define and implement SLOs/SLIs and drive reliability improvements across services
- Lead incident response, post‑incident reviews, and systemic resilience improvements
- Improve MTTR through observability, automation, and operational excellence practices
- Conduct failure‑mode analysis, chaos testing, and capacity planning
- Infrastructure as Code & Delivery
- Build fully automated infrastructure using Terraform, Terragrunt, and related tooling
- Implement GitOps workflows using tools like Argo CD
- Develop secure CI/CD pipelines with policy enforcement, provenance, and gated releases
- Enable zero‑touch deployments and self‑service developer platforms
- Define and implement observability strategies across metrics, logs, and traces
- Work with tools such as Datadog, Prometheus, and OpenTelemetry
- Improve alert quality, reduce noise, and build actionable runbooks
- Drive adoption xcfaprz of distributed tracing and end‑to‑end visibility
- 10+ years in Platform Engineering, SRE, DevOps, or Linux Systems Engineering roles
- Deep expertise in Kubernetes (EKS, AKS, GKE) and cloud‑native architectures
- Strong Linux systems knowledge, including kernel behavior, networking, and performance tuning
- Proven experience in multi‑cloud environments (AWS, GCP, Azure)
- Proven track record operating production systems with high availability (99.9%+)
- Hands‑on experience with Infrastructure as Code (Terraform, Terragrunt)
- Strong understanding of observability, monitoring, and incident response
- Experience implementing GitOps and modern CI/CD pipelines
- Programming/scripting experience (Go, Python, or Bash)
#J-18808-Ljbffr
Compétences linguistiques
- English
Avis aux utilisateurs
Cette offre a été publiée par l’un de nos partenaires. Vous pouvez consulter l’offre originale ici.