Computer Vision / Machine Learning Engineer (Video Generation) at Apple

Beijing, Beijing, China -

Full Time

Start Date

Immediate

Expiry Date

13 May, 26

Salary

0.0

Posted On

12 Feb, 26

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Generative Video Modeling, Video Prediction, Temporal Modeling, Frame Interpolation, Diffusion Models, Transformer Models, PyTorch, JAX, Python, C++, Computer Vision, Machine Learning, Model Optimization, On-Device Deployment, Systems Thinking

Industry

Computers and Electronics Manufacturing

Description

If you are passionate about advancing video generation, building state-of-the-art models that synthesize high-quality and controllable video, and optimizing them for on-device deployment, Apple is the right place for you. We are looking for engineers who combine deep technical expertise, creativity, and systems thinking to push the boundaries of video AI. DESCRIPTION As part of Apple’s Video Engineering org, you will develop models and infrastructure for video generation and understanding across Apple products. You will work on cutting-edge generative techniques, from diffusion and transformer-based models to frame interpolation and temporal modeling, while ensuring models run efficiently on iPhone, iPad, and Vision Pro. You will collaborate with research scientists, framework engineers, and cross-functional teams to design, train, optimize, and deploy scalable video generation systems. MINIMUM QUALIFICATIONS M.S. or Ph.D. in Computer Science, Electrical Engineering, or related fields with focus on computer vision or machine learning. Strong experience in one or more of: generative video modeling, video prediction, temporal modeling, or frame interpolation. Proficiency in deep learning frameworks (PyTorch, JAX) and programming languages (Python, C++). Experience with large-scale training pipelines and deploying models in real-world systems. Strong written and verbal communication skills. PREFERRED QUALIFICATIONS Publications in top-tier conferences (CVPR, ECCV, ICCV, NeurIPS, ICLR). Experience with multi-modal video or text-video generation. Familiarity with optimizing generative models for mobile/embedded devices. Understanding of temporal consistency, controllable generation, and efficient infrastructure for large-scale video modeling. Passion for building scalable, high-quality systems in cross-functional teams.

Responsibilities

The engineer will develop models and infrastructure for video generation and understanding across Apple products, focusing on cutting-edge generative techniques like diffusion and transformer-based models. They will collaborate with various teams to design, train, optimize, and deploy scalable video generation systems efficiently on devices like iPhone and iPad.