Senior System Software Engineer at NVIDIA
pune, maharashtra, India -
Full Time


Start Date

Immediate

Expiry Date

14 May, 26

Salary

0.0

Posted On

13 Feb, 26

Experience

10 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Cloud-Native Systems, Microservices, Distributed Systems, Performance Optimization, Cost Efficiency, System Architecture, Job Orchestration, Resource Optimization, Automation, Resilience, Observability, Kubernetes, AWS, Azure, GCP, SQL

Industry

Computer Hardware Manufacturing

Description
We are now looking for a Senior System Software Engineer. NVIDIA is the leading artificial intelligence computing company and paving the way with innovations in self-driving cars, machine learning, supercomputing, gaming and visualization. NVIDIA gives automakers, research institutions, cloud providers, large companies, and start-ups the power and flexibility to develop and deploy breakthrough artificial intelligence systems. We are an enthusiastic and dedicated team at the forefront of the latest science and technology trends. Working together, we provide a private on-site cloud solution that enables the rest of the organization to quickly release high-quality software. Are you passionate about infrastructure and looking for complex and challenging issues? Are you ready to build the next generation of cloud services, design innovative solutions that address the needs of a whole organization? Then we are excited to have a motivated person like you! What you'll be doing: Spearhead innovation to architect and deliver highly reliable, performant, and scalable cloud-native systems. Lead the design and development of next-generation microservices and distributed systems with a strong emphasis on performance optimization and cost efficiency. Define and evolve system architecture strategies, ensuring alignment with long-term business and technical goals. Tackle complex challenges in job orchestration, resource optimization, and self-healing infrastructure with a focus on automation and resilience. Build and scale end-to-end observability solutions including metrics pipelines, alerting frameworks, and telemetry storage. Leverage data analytics and predictive modeling to proactively improve system behavior and reliability. Provide technical leadership and mentorship across teams while collaborating cross-functionally with product, infrastructure, and operations groups to drive strategic initiatives and foster a culture of engineering excellence and continuous improvement. Design and operate massively scalable systems—handling thousands to millions of jobs and servers—using deep expertise in Kubernetes and public cloud platforms (AWS, Azure, GCP). What we need to see: Demonstrated experience in building and scaling large-scale cloud infrastructure platforms. 10+ years of proven experience in software engineering with a strong track record of delivering enterprise-grade cloud solutions; BS/MS/Ph.D. in Computer Science, Computer Engineering, or equivalent experience. Deep expertise in microservices architecture, with hands-on experience designing and developing scalable, distributed systems. Extensive experience with public cloud platforms (AWS, Azure, GCP), including scaling infrastructure to support thousands to millions of jobs and servers. Strong Kubernetes expertise, including container orchestration and cloud-native tooling for deployment, monitoring, and management. Proficiency in both SQL (e.g., MySQL) and NoSQL (e.g., Elasticsearch) databases, with a solid understanding of scalable storage systems. Hands-on experience with Web Services (SOAP/REST), messaging systems like Kafka, and CI/CD tools such as Jenkins, Git, and Perforce. Excellent debugging, problem-solving, and communication skills, with the ability to lead and collaborate effectively in a globally distributed, multi-time-zone environment. Ways to stand out from the crowd: Proven ability to deconstruct complex systems into modular, scalable components with measurable outcomes and scale systems to handle millions of concurrent jobs and global workloads. Expertise in optimizing cloud infrastructure for performance, reliability, and cost. Solid collaborative and interpersonal skills, specifically a proven ability to effectively guide and influence within a dynamic environment Relentless drive to push the boundaries of system performance and reliability. We are an equal opportunity employer and value diversity at our company. We do not discriminate based on race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform crucial job functions, and to receive other benefits and privileges of employment. NVIDIA is the world leader in accelerated computing. NVIDIA pioneered accelerated computing to tackle challenges no one else can solve. Our work in AI and digital twins is transforming the world's largest industries and profoundly impacting society. Learn more about NVIDIA.
Responsibilities
The role involves spearheading innovation to architect and deliver highly reliable, performant, and scalable cloud-native systems, leading the design and development of next-generation microservices and distributed systems. Responsibilities also include tackling complex challenges in job orchestration, resource optimization, and building end-to-end observability solutions.
Loading...