Principal Software Engineer at Microsoft
Beijing, Beijing, China -
Full Time


Start Date

Immediate

Expiry Date

18 Feb, 26

Salary

0.0

Posted On

20 Nov, 25

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

C/C++, Python, CUDA, ROCm, Triton, GPU Kernels, Performance Analysis, Optimization, NVIDIA Visual Profiler, NVIDIA Nsight Compute, LLM Inference Optimization, TensorRT-LLM, SGLang, vLLM, Problem Solving, Cross-team Collaboration

Industry

Software Development

Description
Work on the software development in C/C++, Python, and in GPU languages such as CUDA, ROCm, or Triton Analyze metrics and identify opportunities based on offline and online testing, develop and deliver robust and scalable solutions. Work with cutting-edge hardware stacks and a fast-moving software stack to deliver best-of-class inference and optimal cost. Engage with key partners to understand and implement inference and training optimization for state-of-the-art LLMs and other models. Bachelor's degree in computer science or related technical field AND 5+ years technical engineering experience with coding in languages including, but not limited to, C/C++, CUDA, ROCm or equivalent experience Practical Experience writing new GPU kernels, going beyond experience of GPU workloads with existing library kernels Quick learning, good communication (fluent in English) and solid problem-solving skills Cross-team collaboration skills and the desire to collaborate in a team of researchers and developers Experience in low-level performance analysis and optimization, including proficiency using GPU profiling tools such as NVIDIA Visual Profiler, and NVIDIA Nsight Compute is a plus Familiar with LLM inference optimization, experience in developing popular inference framework such as TensorRT-LLM, SGLang, vLLM is a plus This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled. *
Responsibilities
Work on software development in C/C++, Python, and GPU languages to deliver robust and scalable solutions. Engage with partners to implement inference and training optimization for state-of-the-art models.
Loading...