Machine Learning Platform Engineer

at  Zoom Video Communications Inc

Remote, Oregon, USA -

Start DateExpiry DateSalaryPosted OnExperienceSkillsTelecommuteSponsor Visa
Immediate10 Sep, 2024USD 94600 Annual11 Jun, 2024N/AGood communication skillsNoNo
Add to Wishlist Apply All Jobs
Required Visa Status:
CitizenGC
US CitizenStudent Visa
H1BCPT
OPTH4 Spouse of H1B
GC Green Card
Employment Type:
Full TimePart Time
PermanentIndependent - 1099
Contract – W2C2H Independent
C2H W2Contract – Corp 2 Corp
Contract to Hire – Corp 2 Corp

Description:

WHAT YOU CAN EXPECT

As a Machine Learning Platform Engineer, you’ll develop and manage Zoom’s AI infrastructure and framework. Enhance AI training, deployment, and operation with improved functionality, capacity, scalability, and reliability. Your role is pivotal in shaping and optimizing Zoom’s AI capabilities.

ABOUT THE TEAM

Seeking a passionate Machine Learning Platform Engineer to join our AI infrastructure team. Our goal is to manage the entire Machine Learning Platform, including model training and infrastructure. We aim to enhance efficiency, GPU training, and language model inference throughput and latency.

WAYS OF WORKING

Our structured hybrid approach is centered around our offices and remote work environments. The work style of each role, Hybrid, Remote, or In-Person is indicated in the job description/posting.

ABOUT US

Zoomies help people stay connected so they can get more done together. We set out to build the best collaboration platform for the enterprise, and today help people communicate better with products like Zoom Contact Center, Zoom Phone, Zoom Events, Zoom Apps, Zoom Rooms, and Zoom Webinars.
We’re problem-solvers, working at a fast pace to design solutions with our customers and users in mind. Here, you’ll work across teams to deliver impactful projects that are changing the way people communicate and enjoy opportunities to advance your career in a diverse, inclusive environment.

Responsibilities:

  • Developing the Machine Learning Platform management system.
  • Building the toolchains, service, pipeline for model development workflow, and model serving architecture.
  • Prioritizing various metrics for model training and inferencing monitoring.
  • Developing and maintaining the high-performance LLM training GPU infrastructure and cluster.
  • Understanding the autoscale for inference service and multi-models for dynamical loading.
  • Supporting, troubleshooting, and resolving any issues during the training and inferencing.


REQUIREMENT SUMMARY

Min:N/AMax:5.0 year(s)

Information Technology/IT

IT Software - Application Programming / Maintenance

Software Engineering

LLM

Proficient

1

Remote, USA