Start Date
Immediate
Expiry Date
30 Nov, 25
Salary
0.0
Posted On
31 Aug, 25
Experience
0 year(s) or above
Remote Job
Yes
Telecommute
Yes
Sponsor Visa
No
Skills
Python, Docker
Industry
Information Technology/IT
ABOUT HUD
HUD (YC W25) is developing agentic evals for Computer Use Agents (CUAs) that browse the web. Our CUA Evals framework is the first comprehensive evaluation tool for CUAs.
Our Mission: People don’t actually know if AI agents are working. To make AI agents work in the real world, we need detailed evals for a huge range of tasks.
We’re backed by Y Combinator, and work closely with frontier AI labs to provide agent evaluation infrastructure at scale.
TECHNICAL SKILLS
How To Apply:
Incase you would like to apply to this job directly from the source, please click here
ABOUT THE ROLE
We’re looking for a research engineer to help build out task configs and environments for evaluation datasets on HUD’s CUA evaluation framework.
RESPONSIBILITIES