VLAM Engineer at Foundation Robotics Labs Inc

San Francisco, California, United States -

Full Time

Start Date

Immediate

Expiry Date

10 Mar, 26

Salary

1000000.0

Posted On

10 Dec, 25

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Vision-Language-Action Models, Transformers, Diffusion Models, Multimodal Encoders, PyTorch, JAX, Robotics, Reinforcement Learning, Software Engineering, Humanoid Robots, Dataset Creation, Real-Time Deployment, Machine Learning, Spatial Reasoning, Simulation-to-Real Training, Action Planning

Industry

Robotics Engineering

Description

Why Are We Hiring for this Role Develop and optimize vision-language-action models, including transformers, diffusion models, and multimodal encoders/decoders. Build representations for 2D/3D perception, affordances, scene understanding, and spatial reasoning. Integrate LLM-based reasoning with action planning and control policies. Design datasets for multimodal learning: video-action trajectories, instruction following, teleoperation data, and synthetic data. Interface VLAM outputs with real-time robot control stacks (navigation, manipulation, locomotion). Implement grounding layers that convert natural language instructions into symbolic, geometric, or skill-level action plans. Deploy models on on-board or edge compute platforms, optimizing for latency, safety, and reliability. Build scalable pipelines for ingesting, labeling, and generating multimodal training data. Create simulation-to-real (Sim2Real) training workflows using synthetic environments and teleoperated demonstration data. Optimize training pipelines, model parallelism, and evaluation frameworks. Work closely with robotics, hardware, controls, and safety teams to ensure model outputs are executable, safe, and predictable. Collaborate with product teams to define robot capabilities and user-facing behaviors. Participate in user and field testing to iterate on real-world performance. What Kind of Person are we looking For Strong experience with training multimodal models, including VLAs, VLMs, vision transformers, LLMs. Ability to build and iterate on large-scale training pipelines. Deep proficiency in PyTorch or JAX, distributed training, and GPU acceleration. Hands-on experience with robotics, embodied AI, or reinforcement/imitation learning. Strong software engineering skills in Python and modern ML tooling. Experience with humanoid robots, manipulation, or whole-body control systems. Familiarity with robotics frameworks (ROS, Isaac, Mujoco,, or similar). Experience with (synthetic) dataset creation and curation. Understanding of real-time deployment constraints on embedded hardware. MSc or PhD in Computer Science, Robotics, Machine Learning, or related field—or equivalent industry experience. Benefits We provide market standard benefits (health, vision, dental, 401k, etc.). Join us for the culture and the mission, not for the benefits. Salary The annual compensation is expected to be between $80,000 - $1,000,000. Exact compensation may vary based on skills, experience, and location.

Responsibilities

Develop and optimize vision-language-action models and integrate LLM-based reasoning with action planning. Collaborate with various teams to ensure model outputs are executable and safe.