Research Scientist, Latent State Inference for World Models at Toyota Research Institute

Los Altos, CA 94022, USA -

Full Time

Start Date

Immediate

Expiry Date

06 Nov, 25

Salary

264000.0

Posted On

07 Aug, 25

Experience

0 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Sensor Fusion, Rss, Predictive Modeling, Perception, Motion, Robotics, Machine Learning, State Estimation, Computer Science, Decision Making, Python

Industry

Information Technology/IT

Description

At Toyota Research Institute (TRI), we’re on a mission to improve the quality of human life. We’re developing new tools and capabilities to amplify the human experience. To lead this transformative shift in mobility, we’ve built a world-class team in Automated Driving, Energy & Materials, Human-Centered AI, Human Interactive Driving, Large Behavior Models, and Robotics.
Within the Human Interactive Driving division, the Extreme Performance Intelligent Control department is working to develop scalable, human-like driving intelligence by learning from expert human drivers. This project focuses on creating a configurable, data-driven world model that serves as a foundation for intelligent, multi-agent reasoning in dynamic driving environments. By tightly integrating advances in perception, world modeling, and model-based reinforcement learning, we aim to overcome the limitations of more compartmentalized, rule-based approaches. The end goal is to enable robust, adaptable, and interpretable driving policies that generalize across tasks, sensor modalities, and public road scenarios—delivering ground-breaking improvements for ADAS, autonomous systems, and simulation-driven software development.
We are seeking a forward-thinking Research Scientist to focus on inferring latent state representations from sensor data, powering world models, and supporting rigorous policy evaluation for autonomous vehicles. This role spans raw perception and structured representations, enabling both high-fidelity predictive modeling and reliable policy assessment in simulated or learned environments.
You will work closely with researchers developing world models and those focused on policy evaluation, ensuring that the latent states inferred from real-world sensors are semantically rich, temporally coherent, and suitable for both long-horizon prediction and counterfactual analysis.

QUALIFICATIONS

PhD in Computer Science, Machine Learning, Robotics, or a related field.

Strong foundation in representation learning or state estimation for sequential decision-making.
Robust experience in deep generative models (e.g., VAEs, diffusion models, autoregressive models).
Solid base in perception models from large-scale real-world sensor datasets from autonomous driving, robotics, or similar domains.
Experience with latent world models, generative AI for perception, or contrastive learning.
Familiarity with structure-from-motion, Gaussian splatting, or neural radiance fields (NeRFs).
Experience with multi-modal sensor fusion, state estimation, and SLAM techniques.
Familiarity with uncertainty-aware perception, active perception, and predictive modeling.
Accomplished publication record at top-tier conferences such as NeurIPS, CVPR, ICCV, ICLR, ICRA, CoRL, or RSS.
Deep programming skills in Python and deep learning frameworks such as PyTorch or JAX.
Excellent problem-solving skills and the ability to work in a fast-paced team research environment.

BONUS QUALIFICATIONS

Background building or using world models in model-based RL, planning, or simulation.

Familiarity with latent-space rollouts, policy evaluation metrics, or offline RL tools.
Knowledge working in high-dimensional, real-time environments with latency constraints.

Responsibilities

Design and train learning-based systems that transform raw multimodal sensor data (e.g., images, lidar, radar) into compact, dynamic latent states suitable for use in learned world models.

Investigate unsupervised, self-supervised, and contrastive methods to learn latent spaces that encode dynamics, semantics, and uncertainty.
Incorporate temporal information and motion consistency into latent state estimation using recurrent, filtering, or transformer-based architectures.
Combine data from heterogeneous modalities into a unified latent state representations that generalize across conditions and scenarios.
Ensure the learned representations are resilient to occlusion, sensor degradation, and distributional shift.
Collaborate on joint research agendas with world modeling and policy evaluation researchers to explore uncertainty modeling, interpretability, and representation bottlenecks.
Publish novel research, contribute to open-source tools, and engage with the academic community at major ML and robotics conferences.