Computer Vision/Machine Learning Engineer (Text-Alignment Understanding) at Apple

Beijing, Beijing, China -

Full Time

Start Date

Immediate

Expiry Date

11 Feb, 26

Salary

0.0

Posted On

13 Nov, 25

Experience

2 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Computer Vision, Machine Learning, Deep Learning, Text-Image Alignment, Video Understanding, Open-Vocabulary Segmentation, Prompt-Based Vision Models, Model Consolidation, Multi-Modal Understanding, On-Device Foundation Models, Prototyping, Model Evaluation, Model Deployment, Cross-Functional Collaboration, Programming, Communication

Industry

Computers and Electronics Manufacturing

Description

If you are passionate about advancing multi-modal understanding, building models that bridge text and visual perception, and shaping the next generation of intelligent on-device experiences, Apple is the right place for you. We are looking for engineers who combine technical depth, curiosity, and creativity to push the boundaries of what machine learning can do on-device. DESCRIPTION The computer vision algorithm engineer will work in a dynamic team as part of the Video Engineering org which develops on-device computer vision and machine perception technologies across Apple’s products. We balance research and product to deliver the highest quality, state-of-the-art experiences, innovating through the full stack, and partnering with cross-functional teams to influence what brings our vision to life and into customers’ hands. You will collaborate closely with research scientists, framework engineers, and cross-functional product teams to deliver state-of-the-art models that run efficiently across Apple’s ecosystem, from iPhone to Vision Pro. Keywords: Concept Prompt; Text-Alignment; Open-Set Segmentation; Multi-Modal Understanding; Model Consolidation; On-Device Foundation Models MINIMUM QUALIFICATIONS M.S. or Ph.D. in Computer Science, Electrical Engineering, or related fields (e.g., mathematics, physics, computer engineering) with a focus on computer vision or machine learning. Solid experience in one or more of the following: open-vocabulary segmentation, text-image alignment, prompt-based vision models, or video understanding. Proficiency in deep learning frameworks (PyTorch, JAX) and programming languages (Python, C++). Demonstrated ability to prototype, evaluate, and deploy models in real-world systems. Strong written and verbal communication skills; ability to present ideas and results to diverse audiences. PREFERRED QUALIFICATIONS Publications in top-tier conferences (e.g., CVPR, ICCV, ECCV, NeurIPS, ICLR). Experience with large-scale pretraining or multi-modal foundation models. Understanding of generative models, visual-language alignment, or open-set recognition. Familiarity with optimizing models for efficient inference on mobile or embedded platforms. Passion for building scalable, high-quality systems and working in cross-functional teams.

Responsibilities

The computer vision algorithm engineer will work in a dynamic team to develop on-device computer vision and machine perception technologies across Apple’s products. You will collaborate closely with research scientists and cross-functional product teams to deliver state-of-the-art models that run efficiently across Apple’s ecosystem.