Sign up with

Already have an account? Log in here

Need some help?
Talk to us at +91 7670800001

Research Engineer, Agentic AI Evals at HUD

Singapore, , Singapore -

Full Time

Start Date

Immediate

Expiry Date

30 Nov, 25

Salary

0.0

Posted On

31 Aug, 25

Experience

0 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Python, Docker

Industry

Information Technology/IT

Description

ABOUT HUD

HUD (YC W25) is developing agentic evals for Computer Use Agents (CUAs) that browse the web. Our CUA Evals framework is the first comprehensive evaluation tool for CUAs.
Our Mission: People don’t actually know if AI agents are working. To make AI agents work in the real world, we need detailed evals for a huge range of tasks.
We’re backed by Y Combinator, and work closely with frontier AI labs to provide agent evaluation infrastructure at scale.

TECHNICAL SKILLS

Proficiency in Python, Docker, and Linux environments
React experience for frontend development
Production-level software development experience preferred
Strong technical aptitude and demonstrated problem-solving ability

How To Apply:

Incase you would like to apply to this job directly from the source, please click here

Responsibilities

ABOUT THE ROLE

We’re looking for a research engineer to help build out task configs and environments for evaluation datasets on HUD’s CUA evaluation framework.

RESPONSIBILITIES

Build out environments for HUD’s CUA evaluation datasets, including evals for safety redteaming, general business tasks, long-horizon agentic tasks etc.
Create custom CUA datasets/evaluation pipelines - likely later as we’re focusing on existing evals for the short term.