Research Scientist Intern, Core AI - Vision and Multi-Modal (PhD) at Meta

Burlingame, California, United States -

Full Time

Start Date

Immediate

Expiry Date

22 Jan, 26

Salary

0.0

Posted On

24 Oct, 25

Experience

0 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Python, Deep Learning, Pytorch, Computer Vision, Multi-Modal Language Models, Generative Models, Large Scale Vision Model Training, Vision Data Collection, Machine Learning, Reinforcement Learning, Deep Learning Methods, Research Experience, Publications, Coding Competitions, Open Source Contributions

Industry

Software Development

Description

The Core AI group at Reality Lab Research is seeking an outstanding Research Scientist Intern to join and advance multi-modal/vision research. As an intern, you will conduct impactful and ambitious research projects about vision and multi-modal and work in a top-tier industrial lab environment. Your research project includes but not is limited to: Spatial Reasoning Vision language models Visual generation/world modeling Unified visual understanding and generation Our internships are twelve (12) to twenty four (24) weeks long and we have flexible start dates throughout the calendar year. Responsibilities Plan and execute impactful research to advance the state-of-the-art in vision or multi-modal foundation models. Publish research findings in top-tier conferences, release code, and effectively present research outcomes to internal and external audiences. Qualifications Programming experience in Python, deep learning frameworks such as Pytorch Research and/or work experience in computer vision, multi-modal language models or generative models Proven track record of achieving significant results as demonstrated by grants, fellowships, patents, as well as publications at leading conferences such as CVPR, ECCV/ICCV, ICLR, ICML, NeurIPS Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment Experience in large scale vision/multi-modal model training/evaluation Experience in large scale vision data collection and curation Experience building systems based on machine learning, reinforcement learning and/or deep learning methods Demonstrated SWE or research experience via an internship, work experience, coding competitions, widely used contributions in open source repositories (e.g. GitHub), or top-tier publications

Responsibilities

Plan and execute impactful research to advance the state-of-the-art in vision or multi-modal foundation models. Publish research findings in top-tier conferences and effectively present research outcomes to internal and external audiences.