Platform ML Engineering Manager, Model Graph

at  OpenAI

San Francisco, California, USA -

Start DateExpiry DateSalaryPosted OnExperienceSkillsTelecommuteSponsor Visa
Immediate16 Feb, 2025USD 530000 Annual19 Nov, 2024N/AGood communication skillsNoNo
Add to Wishlist Apply All Jobs
Required Visa Status:
CitizenGC
US CitizenStudent Visa
H1BCPT
OPTH4 Spouse of H1B
GC Green Card
Employment Type:
Full TimePart Time
PermanentIndependent - 1099
Contract – W2C2H Independent
C2H W2Contract – Corp 2 Corp
Contract to Hire – Corp 2 Corp

Description:

ABOUT THE TEAM

The Platform ML team builds the ML side of our state-of-the-art internal training framework used to train our cutting-edge models. We work on distributed model execution as well as the interfaces and implementation for model code, training, and inference.
Our priorities are to maximize training throughput (how quickly we can train a new model) and researcher throughput (how quickly we can develop new models) with the goal of accelerating progress towards AGI. We frequently collaborate with other teams to speed up the development of new capabilities.

Responsibilities:

ABOUT THE ROLE

We are looking for an experienced engineering manager to help lead critical work on model definition and efficient distributed execution within our shared internal training stack. Our internal training stack is used by Research for large scale and small scale runs.

IN THIS ROLE, YOU WILL:

  • Reduce the time it takes to try out new architecture ideas for training new models and increase the robustness of model code.
  • Collaborate closely with researchers and other systems engineers to maximize the benefits of our shared internal training stack.
  • Make it feasible to get SOTA throughput for our most important research models.
  • Hire world-class AI systems engineers in one of the most competitive hiring markets.
  • Coordinate the training needs of OpenAI’s research teams.
  • Create a diverse, equitable, and inclusive culture that makes all feel welcome while enabling radical candor and the challenging of group think.

YOU MIGHT THRIVE IN THIS ROLE IF YOU:

  • Have 3+ years of experience in engineering management and 7+ years as an IC working with high scale distributed systems and ML systems.
  • Have experience with ML systems, particularly high scale distributed training or inference for modern LLMs.
  • Have familiarity with the latest AI research and working knowledge of how these systems are efficiently implemented.
  • Care deeply about diversity, equity, and inclusion, and have a track record of building inclusive teams.


REQUIREMENT SUMMARY

Min:N/AMax:5.0 year(s)

Information Technology/IT

IT Software - Other

Software Engineering

Graduate

Proficient

1

San Francisco, CA, USA