Senior Software Engineer at Microsoft
Redmond, Washington, United States -
Full Time


Start Date

Immediate

Expiry Date

29 Jan, 26

Salary

0.0

Posted On

31 Oct, 25

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Supercomputing, AI, Cloud-Native, Troubleshooting, System Reliability, Runtime Performance, Job Health, GPU Hardware, Networking, Datacenter, Core Software, Operational Gaps, Advanced Tools, Service Level Agreements, Infrastructure Stack, Engineering

Industry

Software Development

Description
Microsoft Azure High Performance Computing & AI Engineering (HPC & AI Eng) team is responsible for managing the core platform & fleet of AI High Performance Computing products that customers use to run their most performant and demanding workloads. The AI Customer Experience (AICE) engineering team within the HPC & AI Eng. team is on the frontlines managing the flagship supercomputers used by top tier AI customers that enable breakthroughs such as ChatGPT and are highlighted in Top500, MLPerf and Graph500 rankings. Operating at supercomputing scale requires specialized tools and techniques to ensure system reliability, runtime performance, and job health, while continuing to meet customer Service Level Agreements (SLAs). As a Senior Supercomputing Software & Systems Engineer, you will be responsible for diagnosing & troubleshooting the largest scale supercomputing systems across the infrastructure stack (GPU hardware, networking, datacenter and core software). In this role, you will develop and apply advanced tools, identify operational gaps, and implement features that support the smooth operation of cloud-native supercomputers. This opportunity will give you hands-on experience developing capabilities to manage the largest scale of supercomputers delivered to our customers.   Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. 
Responsibilities
As a Senior Supercomputing Software & Systems Engineer, you will be responsible for diagnosing and troubleshooting large scale supercomputing systems. You will develop and apply advanced tools, identify operational gaps, and implement features to support the smooth operation of cloud-native supercomputers.
Loading...