Senior Applied Scientist at Microsoft
Beijing, Beijing, China -
Full Time


Start Date

Immediate

Expiry Date

03 Mar, 26

Salary

0.0

Posted On

03 Dec, 25

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Model Optimization, Deployment, Machine Learning, Data Preprocessing, Quantization, Pruning, Inference Systems, Programming, PyTorch, vLLM, TensorRT-LLM, Evaluation Datasets, Collaboration, Agentic AI Systems, Statistical Analysis, Predictive Analytics

Industry

Software Development

Description
Model Optimization & Deployment: Design and implement efficient workflows for training, distillation, and fine-tuning Small and Large Language Models (SLMs), leveraging techniques such as LoRA, QLoRA, and instruction tuning. Apply model compression strategies—including quantization (e.g., GPTQ, AWQ) and pruning—to reduce inference costs and improve latency. Optimize LLM inference performance using frameworks like vLLM and TensorRT-LLM (TRT-LLM) to enable scalable, low-latency deployment. Build robust and scalable inference systems tailored to heterogeneous production environments, with a strong focus on performance, cost-efficiency, and stability. Develop evaluation datasets and metrics to assess model performance in real-world product scenarios. Build and maintain end-to-end machine learning pipelines encompassing data preprocessing, training, validation, and deployment. Collaborate closely with product managers, engineers, and research scientists to translate business needs into impactful AI solutions, driving real-world adoption and seamless product integration. Bachelor's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 4+ years related experience (e.g., statistics predictive analytics, research) OR Master's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 3+ years related experience (e.g., statistics, predictive analytics, research) OR Doctorate in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 1+ year(s) related experience (e.g., statistics, predictive analytics, research) OR equivalent experience. Solid programming skills with hands-on experience in managing large-scale data and machine learning pipelines. Deep understanding of open-source ML frameworks such as PyTorch, vLLM, and TensorRT-LLM (TRT-LLM). Solid knowledge of model optimization techniques, including quantization, pruning, and efficient inference. Master's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 6+ years related experience (e.g., statistics, predictive analytics, research) OR Doctorate in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 3+ years related experience (e.g., statistics, predictive analytics, research) OR equivalent experience. 1+ years of experience optimizing LLM inference using frameworks like vLLM or TRT-LLM. Practical experience in model compression and deployment within production systems. Experience designing agentic AI systems, such as multi-agent orchestration, tool usage, planning, and reasoning.
Responsibilities
Design and implement efficient workflows for training and deploying language models while optimizing their performance for real-world applications. Collaborate with cross-functional teams to translate business needs into impactful AI solutions.
Loading...