Software Engineer, Agentic Evaluation at Apple

Cupertino, California, United States -

Full Time

Start Date

Immediate

Expiry Date

19 Aug, 26

Salary

0.0

Posted On

21 May, 26

Experience

2 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Swift, Objective-C, Python, Generative AI, Software Evaluation, iOS Development, macOS Development, Test Infrastructure, Data Pipelines, Concurrent Application Architecture, System Services, UI Frameworks, A/B Testing, Model-Graded Evaluation, Scripting, Software Engineering

Industry

Computers and Electronics Manufacturing

Description

We're a team at Apple building software that helps shape the next generation of Siri and AI-powered experiences. The work spans frameworks, tooling, and infrastructure — including a strong focus on how we evaluate and measure the quality of what we ship. We can't say much about specifics, but the problems are new, the surface area is large, and the reach is enormous. We're a collaborative, humble, and curious group that learns from each other and builds together. DESCRIPTION You'll work alongside engineers, designers, and researchers to design and build software end-to-end — from early prototypes to production systems running on real devices. You'll have meaningful autonomy in how you get there, and the opportunity to shape both what we build and how we know it's working. The work is hard enough to stretch you, and the team is generous enough to support you while you grow. MINIMUM QUALIFICATIONS 3+ years of software engineering experience with strong CS fundamentals Proficiency in Swift, Objective-C, Python, or another modern language — strong engineers in adjacent stacks will pick up the rest You've shipped software that people used, and you're ready to own bigger pieces end-to-end Expert in using generative AI models for coding — you've integrated tools like Claude, Cursor, or Codex deeply into how you work, and have a point of view on where they help and where they don't An interest in software evaluation and quality — you care about whether what you build actually works, and want to be on a team that takes measurement seriously Comfortable with ambiguity; when you're stuck, you dig in Strong communication and a track record of working well across teams BS in Computer Science or equivalent experience PREFERRED QUALIFICATIONS Experience in one or more iOS/macOS domains: system services, UI frameworks, concurrent application architecture, or performance Background building developer tools, test infrastructure, evaluation systems, or data pipelines Familiarity with how AI systems are evaluated — offline eval, human eval, A/B, or model-graded approaches Proficiency with one or more scripting languages (Python, Ruby, Bash) You seek out feedback and learn fast from those around you Close to the frontier — curious about new models and techniques, and have a point of view on where human-AI interaction is headed

Responsibilities

Design and build end-to-end software, from prototypes to production systems, to support the next generation of Siri and AI experiences. Focus on creating frameworks, tooling, and infrastructure to evaluate and measure the quality of AI-powered features.