Evaluation & Insights Engineer at Apple

Seattle, Washington, United States -

Full Time

Start Date

Immediate

Expiry Date

08 Jun, 26

Salary

0.0

Posted On

10 Mar, 26

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Python, Data Analysis, Pandas, NumPy, Jupyter, Data Science, Model Evaluation, Qualitative Insights, LLMs, Generative AI, NLP Models, Prompt Engineering, RAG Systems, Annotation Guidelines, Data Interpretation, AI Model Behavior

Industry

Computers and Electronics Manufacturing

Description

Imagine what you could do here. At Apple, great new ideas have a way of becoming extraordinary products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish! Are you passionate about music, movies, and the world of Artificial Intelligence and Machine Learning? So are we! Join our Human-Centered AI team for Apple Products. In this role, you'll represent the user perspective on new features, review and analyze data, and evaluate AI models powering everything from search and recommendations to other innovative features. Collaborate with Data Scientists, Researchers, and Engineers to drive improvements across our platforms. DESCRIPTION We are looking for an Evaluation & Insights Engineer for the Human-Centered AI team to help evaluate and improve AI systems by combining data science, model behavior analysis, and qualitative insights. In this role, you will analyze AI outputs, develop evaluation frameworks, design qualitative, and translate findings into actionable improvements for product and engineering teams. This role blends deep technical expertise with strong analytical judgment to assess, interpret, and improve the behavior of advanced AI models. You will work cross-functionally with the Engineering and Project Managers, Product, and Research teams to ensure that AI experience is reliable, safe, and aligned with human expectations. MINIMUM QUALIFICATIONS Bachelor’s or Master’s degree in Data Science, Computer Science, Linguistics, Cognitive Science, HCI, Psychology, or a related field. At least 5+ years of relevant job experience. Proficiency in Python for data analysis (pandas, NumPy, Jupyter, etc.). Experience working with large datasets and designing model-evaluation pipelines, taxonomies, categorization schemes, or structured rating frameworks. Analytical Strength: Ability to interpret unstructured data (text, transcripts, user sessions) and stitch together qualitative and quantitative findings into actionable guidance. PREFERRED QUALIFICATIONS * Experience working directly with LLMs, generative AI systems, or NLP models. Familiarity with evaluations specific to AI quality, hallucination detection, or model alignment. Experience building internal tools, scripts, or dashboards for evaluation workflows. Familiarity with prompt engineering, RAG systems, or model fine-tuning. Experience evaluating LLMs, multimodal models, or other generative AI systems at scale. Expertise in designing annotation guidelines and managing large scale annotation projects. Background in human factors, social science, or qualitative assessment methodologies. Proficiency in a language besides English.

Responsibilities

This role involves analyzing AI outputs, developing evaluation frameworks, designing qualitative assessments, and translating findings into actionable improvements for product and engineering teams. The engineer will collaborate cross-functionally to ensure AI experiences are reliable, safe, and aligned with human expectations.