Principal Engineering Manager

at  Microsoft

Redmond, WA 98052, USA -

Start DateExpiry DateSalaryPosted OnExperienceSkillsTelecommuteSponsor Visa
Immediate05 Jul, 2024USD 276600 Annual05 Apr, 20242 year(s) or aboveConsideration,Mentoring,Computer Science,Management Skills,Base Pay,Regulations,Performance Management,Ordinances,Color,Crisis Management,Stakeholder Management,C++,Continuous Improvement,C,Microsoft,Python,Citizenship,Ethnicity,Online ServicesNoNo
Add to Wishlist Apply All Jobs
Required Visa Status:
CitizenGC
US CitizenStudent Visa
H1BCPT
OPTH4 Spouse of H1B
GC Green Card
Employment Type:
Full TimePart Time
PermanentIndependent - 1099
Contract – W2C2H Independent
C2H W2Contract – Corp 2 Corp
Contract to Hire – Corp 2 Corp

Description:

Microsoft is a company where passionate innovators come to collaborate, envision what can be and take their careers further. This is a world of more possibilities, more innovation, more openness, and the sky is the limit thinking in a cloud-enabled world.
The AI Platform organization at Microsoft builds the end-to-end Azure AI stack/PaaS and is core to Azure’s innovation and differentiation, as well as all of Microsoft’s flagship products, from Office to Teams, to Xbox. We are the team building Azure OpenAI, Azure ML, Cognitive Services, and the global Azure AI infrastructure for running the largest AI workloads on the planet. We are looking to hire a Principal Engineering Manager to join our team.
We do not just value differences or different perspectives. We seek them out and invite them in so we can tap into the collective power of everyone in the company. As a result, our customers are better served.
Within AI Platform, the Azure ML team enables data scientists and developers to quickly and easily build, train, deploy, manage, and consume machine learning models.
The Azure OpenAI service is responsible for all OpenAI model workloads on Azure. Within this service, the runtime team is responsible for operationalizing and managing the core model inference stack. This team is responsible for model performance across offerings - provisioned throughput, pay-as-you-go, batch as well as modalities - image, text, audio, embeddings for all scenarios. The service needs to run at a 99.9+ service level agreement (SLA) while maximizing utilization on one of the largest accelerator fleets in the worlds at aggressive latency SLA and service level objectives (SLO) points.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

REQUIRED QUALIFICATIONS:

  • Bachelor’s Degree in Computer Science, or related technical discipline AND 8+ years technical engineering experience with coding in C, C++, C# or Python
  • OR equivalent work experience
  • 4+ years people management experience.
  • 4+ years experience building and managing large scale distributed systems that includes experience with online services, user facing APIs, containerized workloads and monitoring stacks.3+ years experience with Incident and crisis management, with the ability to lead the team during service outages and ensure swift resolution and communication with stakeholders.
-

OTHER QUALIFICATIONS:

Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings:

  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

PREFERRED/ADDITIONAL QUALIFICATIONS:

  • Deep experience in cloud service technologies and architectures, particularly with Azure services, to effectively manage and optimize OpenAI service deployments.
  • Experience with monitoring, logging, and diagnostic tools (e.g., Azure Monitor, Application Insights, Prometheus, Grafana) to ensure visibility and proactive management of service health.
  • Demonstrated ability to foster a culture of innovation, collaboration, and continuous improvement within the team.
  • Experience in SLA management, including defining, measuring, and owning KPIs such as uptime, latency, and throughput.
  • Capacity planning and resource management skills to ensure the service can scale efficiently while controlling costs.
  • Experience in stakeholder management, with the ability to communicate complex technical issues and plans to both technical and non-technical audiences.
  • Proven track record of leading and scaling engineering teams, including hiring, mentoring, and performance management.
  • Experience in setting clear expectations and objectives, managing team workload, and ensuring on-time delivery of projects with high quality.
    Software Engineering M6 - The typical base pay range for this role across the U.S. is USD $158,500 - $276,600 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $202,800 - $304,200 per year.
    Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay
    Microsoft will accept applications for the role until April 10, 2024

    aiplatform

nsbe

Microsoft is an equal opportunity employer. Consistent with applicable law, all qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations

Responsibilities:

RESPONSIBILITIES

  • Drive end to end release to market of new OpenAI models and Application Programming Interface (API) functionalities globally across layers of the Azure stack.
  • Establishes processes and policies for deployment according to business and engineering policies to ensure high quality output.
  • Holds accountability to make difficult and impactful decisions, and for any deployment-related impacts on the product or service and any related outcomes.
  • Manage key performance indicators (KPIs), SLAs and performance related customer issues for the range of OpenAI model offerings.
  • Provides technical leadership and guidance in efforts to collect, classify, analyze, and interpret data and analyses on a range of metrics (e.g., health of the system, where bugs might be occurring).
  • Interprets and applies findings from analyses to make informed decisions in engineering products through data integration.
  • Drive automation and best in class observability to enable seamless scaleout and devops across Azure clouds.
  • Ensures ongoing support for services or products are robust and effective through effective telemetry and incident response processes.
  • Establishes guidelines and policies for creating telemetry, engaging in live site maintenance, and responding to incidents.
  • Provides technical oversight on telemetry in systems and products to provide feedback on system behavior such as performance, reliability, availability, utility, and implements safety mechanisms resulting in iterative feedback loops resulting in subsequent designs.
  • Provides technical leadership for creating outputs of telemetry such as notifications or dashboards.
  • Ensures appropriate systems are enacted to reduce incident volume and severity, meets the strategic needs of the product or service, and drives a live site first mentality.
  • Embody our Culture and Values

Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings:

  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter


REQUIREMENT SUMMARY

Min:2.0Max:7.0 year(s)

Information Technology/IT

IT Software - Other

Other

Graduate

C c c or python

Proficient

1

Redmond, WA 98052, USA