On-Device ML Infrastructure Engineer (APIs & Integration) at Apple

Cupertino, California, United States -

Full Time

Start Date

Immediate

Expiry Date

02 Jan, 26

Salary

0.0

Posted On

04 Oct, 25

Experience

2 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Python, C++, ML Authoring Frameworks, PyTorch, TensorFlow, JAX, ML Fundamentals, Modeling, Transformers, Inference Optimizations, Quantization, Pruning, KV Caching, API Design, Compiler Toolchains, Communication Skills

Industry

Computers and Electronics Manufacturing

Description

Apple is the best place to do on-device machine learning, and this team sits at the heart of that field, collaborating with research, SW engineering, HW engineering, and products. This amazing team is responsible for enabling the Research to Production lifecycle of innovative ML models that power magical user experiences on Apple’s hardware and software platforms. We build critical infrastructure that begins with onboarding the latest machine learning architectures to embedded devices, optimization toolkits to better suit target devices, machine learning compilers and runtimes that implement these models as efficiently. Our team also benchmark, analyze and debugging toolchains needed to improve on new model iterations. This infrastructure underpins most of Apple’s critical machine learning workflows across Camera, Siri, Health, Vision, etc. Our team plays an integral part of Apple Intelligence. Our group is seeking an ML Infrastructure Engineer, with a focus on ML user experience APIs and Integration to develop new ML model conversion & authoring APIs that will be a part of CoreML tools (CoreML’s authoring/conversion toolkit). This role takes ownership for integrating the APIs into internal and external systems (e.g., HuggingFace.) DESCRIPTION We are building the first end-to-end developer experience for ML development that, by taking advantage of Apple’s vertical integration, allows developers to iterate on model authoring, optimization, transformation, execution, debugging, profiling and analysis. The coremltools authoring and conversion APIs are the entrypoint to the rest of the infrastructure stack. We are looking for someone who is highly self motivated and passionate about ML modeling (architectures, training vs inference trade-offs, etc.), ML deployment optimizations (e.g., quantization). If you have a proven track record of developing and working with the internals of an ML python library, writing high quality code and shipping software, we strongly encourage you to apply. MINIMUM QUALIFICATIONS Bachelors in Computer Sciences, Engineering, or related subject area. Highly proficient in Python programming, familiarity with C++ is required. Proficiency in at least one ML authoring framework, such as PyTorch, TensorFlow, JAX, MLX. Strong understanding of ML fundamentals and modeling, including common architectures such as Transformers. Understanding of common ML inference optimizations, such as quantization, pruning, KV caching, etc. PREFERRED QUALIFICATIONS Experience with any on-device ML stack, such as TFLite, ONNX, etc. Experience with any on-device ML stack, such as TFLite, ONNX, etc. Experience with designing Python APIs and production deployment of python packages is a strong plus. Experience with Huggingface or any other model repository Experience with MLIR/LLVM or any compiler toolchains Good communication skills, including ability to communicate with multi-functional audiences.

Responsibilities

The team is responsible for enabling the Research to Production lifecycle of innovative ML models and building critical infrastructure for on-device machine learning. This includes developing ML user experience APIs and integrating them into internal and external systems.