Research Internship: Improving Grounding in Language Models (NLP, HCI) at Microsoft
Cambridge, England, United Kingdom -
Full Time


Start Date

Immediate

Expiry Date

01 Mar, 26

Salary

0.0

Posted On

01 Dec, 25

Experience

0 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Natural Language Processing, Human-Computer Interaction, Linguistics, Philosophy of Language, Coding, Data Processing, Statistical Analysis, Interdisciplinary Literature Review, AI Benchmarks Development, Human Annotation Studies, Crowdsourcing, Prototype Development, Language Model APIs, Prompt Design, Research Publications

Industry

Software Development

Description
The intern will lead a literature review across linguistics and the philosophy of language to develop a taxonomy of grounding and support relations for AI‑generated statements. They will contribute definitions, examples, and decision rules that make the taxonomy operational for both human annotators and LLM‑as‑judge evaluators. The intern will design a benchmark: selecting suitable source corpora (including recent groundedness datasets), constructing statement-source pairs, and writing clear annotation guidelines. They will run a human annotation study, potentially crowdsourcing. Where applicable, they will help prepare bespoke annotation tooling. The intern will evaluate frontier models' ability to classify grounding categories and compare LLM‑as‑judge performance to human raters. They will co‑author an academic paper describing the taxonomy, dataset, and findings. Is currently enrolled in or has recently completed a PhD program in human-computer interaction, natural language processing, linguistics, philosophy of language. Experience in coding for research prototypes, data processing, and simple statistical analysis. Is comfortable reviewing and engaging with interdisciplinary literature. High fluency in spoken and written English. Experience developing AI benchmarks. Experience in running human annotation or crowdsourcing studies. Experience building prototypes that leverage language model APIs. Experience in prompt design and engineering. Research publications at top conferences and journals in HCI/ML/NLP.
Responsibilities
The intern will lead a literature review to develop a taxonomy of grounding for AI-generated statements and design a benchmark for evaluating models. They will also run a human annotation study and co-author an academic paper on their findings.
Loading...