AI Engineer (VLM & VLA) at Foundation Robotics Labs Inc
San Francisco, California, United States -
Full Time


Start Date

Immediate

Expiry Date

09 Jun, 26

Salary

300000.0

Posted On

11 Mar, 26

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Vision-Language-Action Models, Transformers, Diffusion Models, Multimodal Encoders, LLM-Based Reasoning, Action Planning, 2D/3D Perception, Spatial Reasoning, PyTorch, JAX, Distributed Training, Python, ML Tooling, Dataset Creation, Sim2Real, Real-Time Deployment

Industry

Robotics Engineering

Description
Why Are We Hiring for this Role Develop and optimize vision-language-action models, including transformers, diffusion models, and multimodal encoders/decoders. Build representations for 2D/3D perception, affordances, scene understanding, and spatial reasoning. Integrate LLM-based reasoning with action planning and control policies. Design datasets for multimodal learning: video-action trajectories, instruction following, teleoperation data, and synthetic data. Interface VLAM outputs with real-time robot control stacks (navigation, manipulation, locomotion). Implement grounding layers that convert natural language instructions into symbolic, geometric, or skill-level action plans. Deploy models on on-board or edge compute platforms, optimizing for latency, safety, and reliability. Build scalable pipelines for ingesting, labeling, and generating multimodal training data. Create simulation-to-real (Sim2Real) training workflows using synthetic environments and teleoperated demonstration data. Optimize training pipelines, model parallelism, and evaluation frameworks. Work closely with robotics, hardware, controls, and safety teams to ensure model outputs are executable, safe, and predictable. Collaborate with product teams to define robot capabilities and user-facing behaviors. Participate in user and field testing to iterate on real-world performance. What Kind of Person are we looking For Strong experience with training multimodal models, including VLAs, VLMs, vision transformers, LLMs. Ability to build and iterate on large-scale training pipelines. Deep proficiency in PyTorch or JAX, distributed training, and GPU acceleration. Strong software engineering skills in Python and modern ML tooling. Experience with (synthetic) dataset creation and curation. Understanding of real-time deployment constraints on embedded hardware. Optimally, familiarity with robotics simulation environments (Isaac Lab, Mujoco, or similar). Ideally, hands-on experience with robotics, embodied AI, or reinforcement/imitation learning. MSc or PhD in Computer Science, Robotics, Machine Learning, or related field—or equivalent industry experience. Benefits We provide market standard benefits (health, vision, dental, 401k, etc.). Join us for the culture and the mission, not for the benefits. Salary The annual compensation is expected to be between $150,000 - $300,000. Exact compensation may vary based on skills, experience, and location.
Responsibilities
Develop and optimize vision-language-action models, including designing multimodal datasets and implementing grounding layers to convert natural language instructions into executable action plans for robotics control. This involves deploying models on edge platforms, building scalable data pipelines, and creating Sim2Real training workflows.
Loading...