GPU Communication - Team Lead at DRIVENETS

Tel Aviv, Tel-Aviv District, Israel -

Full Time

Start Date

Immediate

Expiry Date

16 May, 26

Salary

0.0

Posted On

15 Feb, 26

Experience

10 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

C/C++ Programming, Performance Optimization, NPU Programming, Triton, CUDA, HIP, OpenCL, Distributed Systems, Communication Protocols, Network Programming, PyTorch, TensorFlow, JAX, Distributed Training, Inferencing, Performance Profiling

Industry

Software Development

Description

Location: Tel Aviv #Hybrid DriveNets is a leader in high-scale disaggregated networking solutions. Founded in 2015, DriveNets modernizes the way service providers, cloud providers and hyperscalers build networks. Supporting the largest network in the world, more than half of AT&T’s backbone traffic is running on DriveNets’ Network Cloud open disaggregated architecture. Raising $587 million in three funding rounds, DriveNets is disrupting the networking market from high-scale architecture to AI platforms, and is bringing onboard the most talented people. We are seeking people that want to make an impact on the world’s leading communication networks and are experienced in networking architecture or AI infrastructure solutions. Job Summary We are seeking an experienced technical leader to head our collective communication library development team. This role involves leading a team of engineers in developing high-performance collective communication implementations for multi-NPU and multi-node AI workloads. Key Responsibilities Lead the design and development of collective communication primitives (All-Reduce, All-to-All, Gather/Scatter and etc) Architect scalable communication protocols for multi-NPU and multi-node systems Optimize communication performance for NPU architectures Provide technical leadership to the team members in NPU programming, distributed systems, and communication protocols Work with a success-driven worldwide international team (Network, NPU, QA, AI, DL/ML Framework) Define project milestones, deliverables, and technical roadmaps Ensure compatibility with major AI frameworks (PyTorch, TensorFlow, JAX) Requirements Required Qualifications BSc/MSc in computer science/computer engineering or equivalent 8+ years of experience in systems programming and distributed computing 5+ years of leadership experience managing technical teams Expert-level C/C++ programming with focus on performance optimization Experience with NPU programming (Triton / CUDA / HIP / OpenCL) Deep understanding of distributed systems, communication protocols, and network programming Experience with DL/ML frameworks (PyTorch, TensorFlow) and distributed training / inferencing Experience with performance profiling and optimization tools Strong communication and interpersonal skills Preferred Qualifications Experience with NPU communication library development Contributions to open-source projects (PyTorch, TensorFlow, communication libraries) Familiarity with containerization and orchestration Interoperability experience with partners, vendors and external teams

Responsibilities

The role involves leading a team in developing high-performance collective communication implementations for multi-NPU and multi-node AI workloads, including designing and architecting scalable communication primitives and protocols. Key tasks include optimizing communication performance for NPU architectures and providing technical leadership to the team.