Manager, Infra Tools AI at NVIDIA
Raanana, Center District, Israel -
Full Time


Start Date

Immediate

Expiry Date

29 Jul, 26

Salary

0.0

Posted On

30 Apr, 26

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Python, Infrastructure Engineering, DevOps, CI/CD, LLM, AI, SONiC Network OS, Networking Protocols, Linux, Automation Frameworks, System Internals, Ethernet Switching, Leadership, Mentoring, Software Engineering, Data-driven Decision Making

Industry

Computer Hardware Manufacturing

Description
NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people. We are now seeking a highly motivated Infrastructure, Tools & AI Engineering Manager to join our Ethernet Switching group, working on SONiC Network OS. In this role, you will own and drive the engineering infrastructure that powers the full product development lifecycle — from development environments and CI pipelines through regression, code coverage, and test efficiency. You will apply cutting-edge AI and LLM capabilities to transform how we analyze failures, generate test coverage, and accelerate product quality. What you’ll be doing: Design, build, and maintain scalable infrastructure for development, integration, and test environments supporting SONiC OS. Architect and deliver LLM-based tools for intelligent regression analysis — failure classification, root cause clustering, anomaly detection, and test flakiness prediction Lead efforts to reduce regression runtime through parallelization, smart test selection, and dependency-aware scheduling Develop deep technical knowledge of SONiC Network OS internals, including its subsystem architecture, SAI/ASIC abstraction layer, and management plane Lead and mentor a team of infrastructure and tooling engineers; set technical direction, define priorities, and grow team capabilities What we need to see: B.Sc. degree or higher in Computer Science, Software Engineering, or a related field — or equivalent experience 8+ overall years of software engineering experience, with at least 3 years in an infrastructure, DevOps, or tooling leadership role Strong Python programming skills; experience building production-quality automation frameworks and tooling Demonstrated experience designing and operating CI/CD systems at scale (Jenkins, GitLab CI, GitHub Actions, or equivalent) Hands-on experience with LLMs or AI-assisted developer tooling — building, integrating, or productizing AI capabilities in an engineering workflow Proven ability to lead technical teams: hiring, mentoring, technical roadmapping, and cross-team influence Strong analytical and problem-solving skills with a bias toward measurable outcomes and data-driven decisions Ways to stand out from the crowd: Deep Linux expertise: system internals, networking stack, process management, and scripting Prior experience building LLM-powered test analysis pipelines or AI-enhanced DevOps tooling in a real production environment Knowledge of networking protocols and hardware: Ethernet switching, L2/L3 protocols, QoS, VLANs, high-performance data center networking Experience with code coverage instrumentation in large-scale C/Python codebases and using coverage data for test prioritization Track record of measurably improving regression runtime, test reliability, or CI throughput in a complex embedded or systems software environment NVIDIA pioneered accelerated computing. Today, our AI infrastructure powers global intelligence, transforming every industry. Learn more about NVIDIA.
Responsibilities
You will lead the engineering infrastructure team to manage the full product development lifecycle for SONiC Network OS. Additionally, you will architect and implement AI and LLM-based tools to enhance regression analysis, test efficiency, and product quality.
Loading...