On-device ML Infrastructure Engineer (ML Modeling Semantics & Representatio at Apple

Cupertino, CA 95014, USA -

Full Time

Start Date

Immediate

Expiry Date

25 Jul, 25

Salary

312200.0

Posted On

25 Apr, 25

Experience

0 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Optimization Techniques, Communication Skills, Interpersonal Skills, C++

Industry

Information Technology/IT

Description

The On-Device Machine Learning team at Apple is responsible for enabling the Research to Production lifecycle of cutting edge machine learning models that power magical user experiences on Apple’s hardware and software platforms. This team sits at the heart of that discipline, interfacing with research, SW engineering, HW engineering, and products. Our group is looking for an ML Infrastructure Engineer, with a focus on ML model semantics, representation, and optimizations. The role is responsible for working with ML research and Applied research engineers to onboard the newest ML architectures to CoreML’s ML model representation, including evolving the representation to support the latest and greatest features in the authored ML program (e.g., PyTorch), and enable the exposure of Apple’s on-device execution capabilities. The role is responsible for building critical “bridging” infrastructure between the most used ML frameworks (e.g., PyTorch) and Apple’s CoreML stack. Key responsibilities: - Develop technologies to quickly onboard new ML models to our on-device stack, including contributions to ML authoring frameworks. - Understand different ML operations, architectures, and graph representations in different authoring frameworks. Keep abreast of latest innovations in this space. - Architect and build CoreML’s model representation that can efficiently represent program semantics from the authored frameworks, while allowing for peak execution performance. - Define and build the user-facing model translation and ingestion abstractions, APIs, and surrounding toolkit to allow seamless model import into Apple’s ML stack. - Perform optimizations such as quantization, operator transformations, etc. to make models more amenable to efficient on-device deployment

DESCRIPTION

As an engineer in this role, you will be primarily focused on the interplay between higher-level ML authoring frameworks (such as PyTorch, JAX, MLX, etc.) and Apple’s on-device ML infrastructure. The role requires an understanding of ML modeling (architectures, training vs inference trade-offs, etc.) and ML deployment optimizations (compression, distillation, quantization, hardware optimizations, etc.). We are building the first end-to-end developer experience for ML development that, by taking advantage of Apple’s vertical integration, allows developers to iterate on model authoring, optimization, transformation, execution, debugging, profiling and analysis. The ML representation, translation and optimization is the entry point of such infrastructure stack.

MINIMUM QUALIFICATIONS

Bachelors in Computer Sciences, Engineering, or related discipline.
Highly proficient in Python programming, familiarity with C++ is required.
Proficiency in at least one ML authoring framework, such as PyTorch, TensorFlow, JAX, MLX.
Strong understanding of ML fundamentals, including common architectures such as Transformers.

PREFERRED QUALIFICATIONS

Hands on experience working with and/or developing ML optimization techniques such as quantization.
Experience with accelerators, GPU programming is a strong plus.
Experience with any on-device ML stack, such as TFLite, ONNX, etc.
Experience with MLIR compiler stack is a strong plus
Good communication skills, including ability to communicate with cross-functional audiences.
Excellent communication & interpersonal skills.

Responsibilities

Please refer the Job description for details