Senior AI Software Architect at Microsoft
Redmond, Washington, United States -
Full Time


Start Date

Immediate

Expiry Date

23 Feb, 26

Salary

0.0

Posted On

25 Nov, 25

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

AI Models, Model Optimization, PyTorch, Quantization Techniques, Parallelization Strategies, Distributed Training, AI Inference Stacks, Performance Profiling, Triton Kernels, CUDA Programming, AI Accelerator Hardware, Embedded Systems, Model Checkpointing, Resharding Scripts, Large-Scale Model Deployments, Problem-Solving

Industry

Software Development

Description
Model Enablement: Port and optimize large-scale AI models (e.g., foundation models, diffusion models, YOLO) to run efficiently on Maia hardware. Integrate models using frameworks such as PyTorch, ONNX, vLLM, and SGLang. Apply techniques like KV cache quantization (e.g., BF16 → FP8), checkpointing, and re-sharding for efficient inference and training. Collaborate on improving inference pipelines, including KV caching in sglang/vllm and performance tuning at the PyTorch level. Work with Triton kernels for basic operations (e.g., FP8 dequantization) and assist in kernel performance analysis. Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience. Bachelor's Degree in Computer Science or Engineering. 3+ years of strong hands-on experience with PyTorch and model optimization techniques. Practical knowledge of quantization techniques like PTQ/QAT especially for KV cache quantization. Familiarity with parallelization strategies and distributed training concepts (e.g., sharding, allreduce). 2+ years of experience with AI inference stacks like SGLang/vLLM and performance profiling. Excellent problem-solving and communication skills; ability to work in a collaborative team environment. 3+ years of experience in Triton kernels and CUDA programming (basic understanding is acceptable but willingness to learn is essential). Experience with AI accelerator hardware and embedded systems. 3+ years of prior work on efficient model checkpointing, resharding scripts, and large-scale model deployments for serving at scale.
Responsibilities
The Senior AI Software Architect will port and optimize large-scale AI models to run efficiently on Maia hardware and integrate models using various frameworks. The role involves collaborating on improving inference pipelines and assisting in kernel performance analysis.
Loading...