Research Engineer, Agentic AI Evals at HUD
Singapore, , Singapore -
Full Time


Start Date

Immediate

Expiry Date

30 Nov, 25

Salary

0.0

Posted On

31 Aug, 25

Experience

0 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Python, Docker

Industry

Information Technology/IT

Description

ABOUT HUD

HUD (YC W25) is developing agentic evals for Computer Use Agents (CUAs) that browse the web. Our CUA Evals framework is the first comprehensive evaluation tool for CUAs.
Our Mission: People don’t actually know if AI agents are working. To make AI agents work in the real world, we need detailed evals for a huge range of tasks.
We’re backed by Y Combinator, and work closely with frontier AI labs to provide agent evaluation infrastructure at scale.

TECHNICAL SKILLS

  • Proficiency in Python, Docker, and Linux environments
  • React experience for frontend development
  • Production-level software development experience preferred
  • Strong technical aptitude and demonstrated problem-solving ability

How To Apply:

Incase you would like to apply to this job directly from the source, please click here

Responsibilities

ABOUT THE ROLE

We’re looking for a research engineer to help build out task configs and environments for evaluation datasets on HUD’s CUA evaluation framework.

RESPONSIBILITIES

  • Build out environments for HUD’s CUA evaluation datasets, including evals for safety redteaming, general business tasks, long-horizon agentic tasks etc.
  • Create custom CUA datasets/evaluation pipelines - likely later as we’re focusing on existing evals for the short term.
Loading...