À propos
Location: Ashburn, VA - Onsite
Duration: 6 to 8 months
Role Overview:
The HPC (High-Performance Computing) Role focuses on planning, implementing, and managing InfiniBand network configurations for high-performance computing in data centers. The role emphasizes network and physical network troubleshooting (e.g., NIC testing, Ixia-enabled testing), with a skill distribution of 60% network, 30% Linux + CI/CD, and 10% HPC. Responsibilities include configuring switches, routers, and adapters, implementing security protocols, monitoring performance, troubleshooting, collaborating with vendors, and developing automation scripts.
Key Responsibilities: Configure and manage InfiniBand networks, including switches, routers, adapters, and performance tuning (e.g., MTU, buffer sizes, PFC/DCB for congestion management). Conduct physical network troubleshooting (e.g., NIC testing, Ixia-enabled testing for performance validation). Develop automation scripts (Python, Shell) for network tasks, leveraging libraries like Netmiko, NAPALM, Jinja; Ansible a plus. Monitor performance using tools like EPM/IPM; implement security protocols (MACsec, IPsec, access controls). Collaborate with vendors for compatibility, POCs, and BOMs; support lab/pre-field testing. Document configurations and processes via MOP/SOP. Qualifications:
Bachelor's degree in Computer Science, IT, or related field. 5+ years of InfiniBand experience in enterprise/lab environments. Expertise in InfiniBand architecture, protocols; RoCE a plus. Proficient in Python, Shell scripting (junior developer level, 1-2 years) for network automation; Git experience preferred. Strong network security (MACsec/IPsec), troubleshooting, and performance tuning skills. Familiarity with RDMA applications, parallel computing frameworks (e.g., MPI, OpenMP). Certifications (e.g., IBTA, CCNP) a plus; Linux/UNIX proficiency and CI/CD mindset required. Skill Distribution (60/30/10):
60% Network:
Emphasis on InfiniBand troubleshooting, NIC testing, Ixia-enabled testing, and performance tuning (e.g., PFC/DCB, MTU). 30% Linux + CI/CD:
Linux/UNIX administration, Python/Shell scripting for automation, CI/CD familiarity (Git/Jenkins). 10% HPC:
Basic HPC cluster knowledge, RDMA applications, parallel computing (MPI/OpenMP).
Expérience professionnelle
- DevOps
- System Engineer
- Network Engineer
Compétences linguistiques
- English
Avis aux utilisateurs
Cette offre provient d’une plateforme partenaire de TieTalent. Cliquez sur « Postuler maintenant » pour soumettre votre candidature directement sur leur site.