À propos
As a Network Engineer on the Cluster Architecture Team, you will work closely with the vendors, internal networking teams and industry peers to develop best-in-class interconnect architecture of the current and future generations of the Cerebras AI clusters. You will be responsible for developing proof-of-concept of new network designs and features enabling resilient and reliable network for AI workloads. The role will require cross-functional collaboration and interaction with diverse hardware components (e.g., network devices and the Wafer-Scale Engine) as well as software at several layers of the stack, from host-side networking to cluster-level coordination. The role also requires understanding of network monitoring systems and network debugging methodologies.
Responsibilities- Design AI/ML and HPC Clusters with a focus on the network technology.
- Identify and address performance or efficiency bottlenecks, ensuring high resource utilization, low latency, and high throughput communication.
- Stay current on emerging networking technologies: evaluate new hardware, fabrics, and protocols to improve cluster performance, scalability, and cost efficiency.
- Drive technical projects involving multiple teams, various software and hardware components coming together to realize advanced networking technologies.
- Bring effective communication skills.
- Collaborate with vendors and industry peers to drive network hardware and feature roadmap.
- Pre-deployment readiness & port mapping: build/validate rack/row and patch-panel port maps, cabling plans, if required in rare cases.
- Bring-up & rare deployment debugging: assist with lab/staging validation, packet captures, link level diagnostics, and synthetic traffic tests.
- Ph.D. in Computer Science or Electrical Engineering 10 years industry experience or Master's in CS or EE 15 years industry experience.
- 5 Years of experience in large scale network designs in WAN or Datacenter.
- Extensive experience debugging networking issues in large distributed systems environment with multiple networking platforms and protocols.
- Experience of managing and leading multi-phase and multi-team projects.
- Networking platforms like Juniper, Arista, Cisco, open-box architectures (SONiC, FBOSS).
- Networking protocols like RoCE, BGP, DCQCN, PFC, streaming telemetry.
- Familiarity with automation languages like Python or Go.
- Familiarity with network visibility and management systems.
Compétences linguistiques
- English
Cette offre provient d’une plateforme partenaire de TieTalent. Cliquez sur « Postuler maintenant » pour soumettre votre candidature directement sur leur site.