Principal Software Engineer at Microsoft
, , United States -
Full Time


Start Date

Immediate

Expiry Date

18 Feb, 26

Salary

0.0

Posted On

20 Nov, 25

Experience

10 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Software Engineering, Distributed Systems, AI Development, Automation, Networking, Observability, Security, Control Systems, Programming, Cloud Services, Telemetry, Data Analysis, Integration, Reliability Engineering, Continuous Delivery, Chaos Testing, Cross-Domain Collaboration

Industry

Software Development

Description
Build and Scale Autonomous Network Systems: Design and implement highly available, distributed software systems that power and maintain Azure's optical network at hyperscale. This includes everything from device-level telemetry, monitoring, and control software to globally distributed automation services that remediate and repair the network autonomously. Full-Stack Systems Engineering: Work across the full stack—from the embedded systems running on optical devices that collect and instrument data, to the cloud-scale services that analyze, decide, and act. Design for safety, resilience, observability, and rapid iteration across millions of data points per second. Agents and Automation Platforms: Develop the next generation of AI-driven agents and orchestration platforms that enable autonomous network operations. Build contextual, sensory, and motor systems that allow agents to perceive, reason about, and act safely and securely on the network. Context and Control Services, including Model Context Protocol/Electronic Services (MCP/eServices): Create and evolve micro-control planes and context services that give AI systems deep awareness of network state, enabling safe decision-making and intelligent automation across the optical domain. Cross-Domain System Integration: Collaborate closely with optical, switching, and AI infrastructure teams to deliver end-to-end, self-healing systems that tie together photonic, packet, and compute control planes. Operational Excellence and Reliability Engineering: Drive engineering rigor through metrics, observability, chaos testing, and continuous validation. Ensure the reliability and security of systems that operate some of the most mission-critical infrastructure in the world. Innovation and Industry Leadership: Contribute to pioneering efforts in autonomous infrastructure management—continuing our track record of delivering industry-first AI agents and platforms that redefine how hyperscale networks are built and operated. Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python Bachelor's Degree in Computer Science or related technical field AND 10+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, OR Python OR Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience. AI Context Engineering: Building context and knowledge services for AI agents, including embeddings and retrieval, vector/time-series stores, feature pipelines, and contract-first Application Programming Interfaces (APIs) for tool exposure. AI Agent Development & Evaluation: Designing and assessing AI agents for operational automation with offline/online evaluations, golden sets, canary/A/B testing, safety guardrails, and audit trails. Control & Workflow Expertise: Familiarity with MCP or eServices-style control/context planes, tool interface design, and agent workflow engines such as Temporal or equivalent. Networking & Platform Skills: Exposure to optical networking including Dense Wavelength Division Multiplexing (DWDM) link budgeting, Optical Signal to Noise Ratio/Bit Error Rate (OSNR/BER) monitoring, transponder control, metro/long-haul design plus experience with Kubernetes and Continuous Integration/Continuous Delivery (CI/CD). Software & Systems experience: 6+ years building production software for network automation and operations; 4+ years designing and running distributed, highly available services at scale; fundamentals in concurrency, reliability, and performance. Programming & Automation experience : 1+ years building closed-loop automation including telemetry collection, streaming/state evaluation, policy orchestration, and safe actuation on network or optical devices and 1+ years of experience with Go or Python. Integration & Observability experience: 1+ years with device/controller interfaces including: Network Configuration Protocol/Yet Another Next Generation (NETCONF/YANG), Google Network Management Interface/Google Network Operations Interface (gNMI/gNOI), Simple Network Management Protocol (SNMP), vendor Software Development Kits (SDKs), Remote Procedure Call/Google Remote Procedure Call (RPC/gRPC); observability practices across metrics, logs, and traces, including Service Level Objective (SLO) design, error budgets, and on-call ownership. Security & Leadership experience: 3+ years with Secure-by-design mindset (auth, authorization, key/secret management, auditability) and proven ability to lead cross-functional engineering efforts from design to production with measurable outcomes.
Responsibilities
Design and implement highly available, distributed software systems for Azure's optical network. Collaborate with various teams to deliver self-healing systems and drive operational excellence.
Loading...