Member of Technical Staff, Pre-training Infra GPU at Microsoft

Redmond, Washington, United States -

Full Time

Start Date

Immediate

Expiry Date

02 Mar, 26

Salary

0.0

Posted On

02 Dec, 25

Experience

10 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Python, CUDA, AI Models, Performance Optimization, Benchmarking, Debugging, Collective Communication Libraries, NCCL, NVLink, InfiniBand, Distributed Computing, Machine Learning, Generative AI, Networking, Storage Systems, High-Performance Computing

Industry

Software Development

Description

Design, implement, test, and optimize AI models in Python and CUDA C++ for large-scale GPU clusters. Profile, benchmark, and debug performance bottlenecks across compute, memory, and networking subsystems. Optimize collective communication libraries (e.g., NCCL) for emerging NVLink and InfiniBand topologies. Collaborate with hardware teams to optimize for next-generation accelerators (NVIDIA, AMD, and beyond). Gather data and insights to develop the pretraining compute roadmap. Care deeply about conversational AI and its deployment. Actively contribute to the development of AI models that are powering our innovative products. Find a path to get things done despite roadblocks to get your work into the hands of users quickly and iteratively. Enjoy working in a fast-paced, design-driven, product development cycle. Embody our Culture and Values. Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience. Experience with GPU programming (CUDA, NCCL) and frameworks such as PyTorch. Proven ability to profile, benchmark, and optimize performance-critical systems. Experience in leading technical projects and supporting architectural decisions with data. Experience building infrastructure for large-scale machine learning or generative AI workloads. Deep expertise in networking (InfiniBand, NVLink), storage systems, or distributed training parallelisms. Strong background in distributed computing and large-scale systems. Track record of contributing to high-performance computing or large-scale AI infrastructure projects.

Responsibilities

Design, implement, test, and optimize AI models for large-scale GPU clusters. Collaborate with hardware teams to optimize for next-generation accelerators and contribute to the development of AI models powering innovative products.