Principal AI Architect at Microsoft
Redmond, Washington, United States -
Full Time


Start Date

Immediate

Expiry Date

20 Feb, 26

Salary

0.0

Posted On

22 Nov, 25

Experience

10 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

AI Systems, Hardware/Software Co-Design, Performance Engineering, AI Accelerator Architectures, GPU Architectures, Performance Tuning, Kernel Development, Cross-Disciplinary Collaboration, Distributed AI Workloads, Compiler Frameworks, Runtime Frameworks, Transformer-Based Models, Performance Modeling, Benchmarking, Workload Simulation, Technical Leadership

Industry

Software Development

Description
Model Bring-Up & Characterization Lead the bring-up and functional validation of LLMs on custom AI accelerators and GPUs. Develop and maintain detailed performance characterizations across compute, memory, and interconnect domains. Instrument and profile end-to-end training and inference workloads to identify scaling inefficiencies and performance gaps. Analyze kernel- and system-level traces to identify limiting factors in compute, memory, and interconnect. Guide runtime and compiler improvements informed by workload analysis. Collaborate with teams across Azure ML, DeepSpeed, and Maia hardware programs to deliver production-grade AI infrastructure. Present architectural findings and recommendations to senior engineering leadership. Master's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 9+ years technical engineering experience OR Bachelor's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 11+ years technical engineering experience OR equivalent experience. 10+ years of experience in AI systems, hardware/software co-design, or performance engineering. 5+ years of people management experience. Deep understanding of AI accelerator and GPU architectures, including compute pipelines, memory hierarchies, and interconnects. Proficiency with PyTorch, CUDA, Triton, or similar frameworks for performance tuning and kernel development. Proven track record of cross-disciplinary collaboration between hardware, software, and ML model teams. Experience profiling and optimizing large-scale distributed AI workloads. Experience with compiler and runtime frameworks (e.g., MLIR, TVM, XLA, or custom code generation flows). Familiarity with DeepSpeed, Megatron-LM, SGLang, or vLLM training and inference pipelines. Deep understanding of transformer-based model architectures and scaling behaviors. Hands-on experience with AI performance modeling, benchmarking, or workload simulation. Demonstrated technical leadership and communication skills in highly collaborative environments.
Responsibilities
Lead the bring-up and functional validation of LLMs on custom AI accelerators and GPUs. Collaborate with teams across Azure ML, DeepSpeed, and Maia hardware programs to deliver production-grade AI infrastructure.
Loading...