Principal Software Engineer - Azure AI Inferencing at Microsoft
Redmond, Washington, United States -
Full Time


Start Date

Immediate

Expiry Date

21 Feb, 26

Salary

0.0

Posted On

23 Nov, 25

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

C, C++, C#, Java, Golang, Distributed Computing, Software Engineering, Real-Time Services, Low Latency, High Throughput, Network Architecture, HTTP Protocols, TCP Protocols, Docker, Kubernetes, Cross-Team Collaboration

Industry

Software Development

Description
Lead the design and implementation of core inference infrastructure for serving frontier AI models in production. Identify and drive improvements to end-to-end inference performance and efficiency of OpenAI and other state-of-the-art LLMs. Lead the design and implementation of efficient load scheduling and balancing strategies, by leveraging key insights and features of the model and workload. Scale the platform to support the growing inferencing demand and maintain high availability. Deliver critical capabilities required to serve the latest and greatest Gen AI models such as GPT5, Realtime audio, Sora, and enable fast time to market for them. Collaborate with our partners both internal and external. Mentor engineers on distributed inference best practices. Bachelor's degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, or Golang OR equivalent experience. These requirements include, but are not limited to the following specialized security screenings: 4+ years' practical experience working on high scale, reliable online systems. Technical background and foundation in software engineering principles, distributed computing and architecture. Experience in real-time online services with low latency and high throughput. Experience working with L7 network proxies and gateways. Knowledge in Network architecture and concepts (HTTP and TCP Protocols, Authentication and Sessions etc). Knowledge and experience in OSS, Docker, Kubernetes, C++, Golang, or equivalent programming languages. Cross-team collaboration skills and the desire to collaborate in a team of researchers and developers. Ability to independently lead projects.
Responsibilities
Lead the design and implementation of core inference infrastructure for serving frontier AI models in production. Scale the platform to support the growing inferencing demand and maintain high availability.
Loading...