Technical Support Engineer

Rafay Systems

New York, New York, United States

New York, New York, United States

Über

Rafay is redefining how enterprises and GPU cloud providers deploy, manage, and scale modern applications. Our platform delivers self-service workflows, multi-cluster orchestration, and end-to-end life cycle management to Kubernetes and cloud-native infrastructure—empowering platform teams to operate with speed, security and efficiency at scale. As we grow, we’re looking for a Technical Support Engineer who thrives on solving complex distributed systems problems and is passionate about delivering a world‑class support experience.
Role Summary This is a
deeply technical, hands‑on support role
focused on diagnosing and resolving real‑world production issues in Kubernetes environments. This is not a ticket triage role—you’ll be expected to
own problems end‑to‑end .
You’ll work directly with enterprise customers running mission‑critical workloads, acting as a
technical escalation point
across Kubernetes control planes, cluster lifecycle operations, networking, and cloud infrastructure. You’ll collaborate closely with Engineering and SRE teams to debug issues, identify root causes, and drive resolution—not just workaround symptoms.
This role offers a unique opportunity to work at the cutting edge of Kubernetes, cloud infrastructure, and AI/ML platform management, while collaborating with our Customer Success and Engineering teams to ensure successful customer outcomes.
Key Responsibilities
Own and resolve
advanced technical support cases
involving multi‑cluster Kubernetes deployments, cluster provisioning failures, and workload runtime issues across public/private clouds
Perform deep troubleshooting using tools like kubectl, cluster logs, events, and metrics to diagnose issues across control plane and data plane components
Debug and support
cluster lifecycle management
workflows including provisioning, upgrades, scaling, and recovery.
Analyze issues related to
networking (CNI), ingress, DNS, service mesh, and storage (CSI)
in Kubernetes environments
Reproduce complex customer issues in internal environments and identify
root cause with precision
Act as a trusted customer advocate—proactively identifying risks and working cross‑functionally to resolve them. Collaborate with Engineering to escalate bugs, validate fixes, and improve product reliability
Provide clear, concise, and technically accurate communication to customers during incident resolution
Contribute to
runbooks, troubleshooting guides, and knowledge base articles
Stay up to date on Rafay platform features, releases, and cloud‑native ecosystem updates.
Participate in
on‑call rotations
to support critical customer incidents
Required Qualifications
5+ years of experience in
Technical Support, SRE, or DevOps roles
supporting production environments
Strong hands‑on experience managing and troubleshooting
Kubernetes clusters in production
Deep expertise with Kubernetes architecture, container orchestration technologies and
debugging techniques
Proven ability to troubleshoot Pod lifecycle issues, Cluster networking (DNS, Routing, Firewalls etc.), Storage, Helm deployments and Node‑level issues
Strong understanding of
cloud platforms : AWS, GCP, or Azure and virtualization technologies (vSphere, OpenStack)
Solid fundamentals in
Linux systems, networking (TCP/IP, DNS), and distributed systems
Experience working in
customer‑facing roles , handling escalations and high‑severity incidents, experience with support tools like Zendesk.
Excellent written and verbal communication skills
Proven ability to work independently in fast‑paced, dynamic environments.
Bachelor’s degree in computer science or related field (or equivalent practical experience).
Preferred Qualifications
CKA (Certified Kubernetes Administrator)
or equivalent hands‑on expertise
Experience with Kubernetes ecosystem tools such as Helm, Prometheus, Grafana, and Terraform
Familiarity with
multi‑cluster management and GitOps workflows
Experience supporting
enterprise SaaS platforms or developer infrastructure products
Exposure to
AI/ML infrastructure or GPU‑based workloads
is a plus
Why Join Rafay? Rafay is at the forefront of cutting‑edge cloud‑native and GPU PaaS technologies and on a mission to modernize infrastructure for the next generation of enterprise applications—cloud‑native, AI/ML‑driven, and highly scalable. We offer:
A front‑row seat to foundational innovations in cloud‑native and GPU PaaS technologies.
A collaborative, fast‑paced work environment with opportunities to grow and lead.
Competitive compensation, comprehensive benefits, and attractive stock options.
A culture focused on learning, ownership, and technical excellence
#J-18808-Ljbffr

New York, New York, United States

Sprachkenntnisse

English

Hinweis für Nutzer

Dieses Stellenangebot wurde von einem unserer Partner veröffentlicht. Sie können das Originalangebot einsehen hier.

Ähnliche Jobs finden