Manager, Network Simulation and Infrastructure at NVIDIA
Tel-Aviv, Tel-Aviv District, Israel -
Full Time


Start Date

Immediate

Expiry Date

29 Mar, 26

Salary

0.0

Posted On

29 Dec, 25

Experience

10 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

C++, DevOps, Infrastructure Management, High-Performance Computing, Simulation, Networking, Automation, Linux, CI/CD, Performance Optimization, Team Leadership, Product Ownership, System-Level Programming, Concurrent Programming, Discrete Event Simulation, Kubernetes, Docker

Industry

Computer Hardware Manufacturing

Description
NVIDIA is searching for a strong technical leader to own the backbone of our Networking Research capabilities. We are looking for an Engineering Manager to lead the development of our high-fidelity Network Simulation platform and the extensive on-premise infrastructure that powers it. In this role, you will lead a team of performance simulation software engineers and DevOps/Infrastructure specialists. You will own the "Simulation-as-a-Service" product-a critical platform used by internal researchers to model next-generation data center architectures. Your mission is to ensure our simulations are accurate, performant, and accessible, while managing the large-scale compute clusters required to run them. What you'll be doing: Team Leadership: Manage and mentor a team of C++ software engineers and DevOps infrastructure engineers, fostering a culture of performance, reliability, and code quality. Product Ownership (Sim-as-a-Service): Treat the internal simulation platform as a product. Work with research partners to define the roadmap, prioritize features, and ensure high availability for users. High-Performance Simulation: Be responsible for the architecture and optimization of complex network simulation engines (C++ based), ensuring they can scale to model extensive data center topologies with high fidelity. Infrastructure Management: Own the lifecycle of our on-premise compute clusters and servers. Drive decisions on hardware upgrades, prioritisation, and managing system resources. DevOps & Automation: Lead the strategy for CI/CD pipelines, automated testing, and containerized deployments to ensure rapid iteration and stability of the simulation platform. multi-functional Collaboration: Partner with the AI Agents team to expose simulation APIs, enabling agents to run experiments and gather data autonomously. What we need to see: MSc, Ph.D. or equivalent experience in Computer Science, Electrical Engineering, or a related field. 8+ years of hands-on software engineering experience, with a proven track record of leading technical teams in systems or infrastructure domains for 3+ years. 3+ years of managerial experience. C++ Expertise: Strong background in C++ development for high-performance applications (System-level programming, concurrent programming). Infrastructure & DevOps: Practical experience managing on-premise servers, Linux environments, and modern DevOps tools (Kubernetes, Slurm, Docker, Ansible). Operational Rigor: Ability to manage "heavy" operations-ensuring uptime, monitoring system health, and optimizing hardware utilization. Ways to stand out from the crowd: Networking Knowledge: Deep understanding of computer networking fundamentals (TCP/IP, Ethernet, InfiniBand, Congestion Control) and data center architectures. Simulation/Modeling: Experience with discrete event simulation (DES) or modeling complex systems. HPC Background: Experience working with MPI, CUDA, or other High-Performance Computing frameworks. Specific Simulators: Familiarity with standard network simulators like OMNeT++, NS-3, or similar proprietary tools. Hardware Knowledge: Understanding of switch micro-architecture or NIC design is a significant plus. NVIDIA is home to some of the most innovative and dedicated professionals in the industry. We are committed to fostering a diverse work environment and are proud to be an equal-opportunity employer. NVIDIA is the world leader in accelerated computing. NVIDIA pioneered accelerated computing to tackle challenges no one else can solve. Our work in AI and digital twins is transforming the world's largest industries and profoundly impacting society. Learn more about NVIDIA.
Responsibilities
Lead a team of performance simulation software engineers and DevOps specialists to develop a high-fidelity Network Simulation platform. Manage the lifecycle of on-premise compute clusters and ensure the simulation platform is accurate, performant, and accessible.
Loading...