Multimodal Speech Engineer, AI Companion at 1X Technologies AS

Palo Alto, California, United States -

Full Time

Start Date

Immediate

Expiry Date

11 Feb, 26

Salary

250000.0

Posted On

13 Nov, 25

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Speech Modeling, Audio Modeling, Conversational Models, Data Pipelines, Real-Time Architectures, Body Language Synchronization, Personality Customization, Creative Problem Solving

Industry

Robotics Engineering

Description

About 1X We’re an AI and robotics company based in Palo Alto, California, on a mission to build a truly abundant society through general‑purpose robots capable of performing any kind of work autonomously. We believe that to truly understand the world and grow in intelligence, humanoid robots must live and learn alongside us. That’s why we’re focused on developing friendly home robots designed to integrate seamlessly into everyday life. We’re looking for curious, driven, and passionate people who want to help shape the future of robotics and AI. If this mission excites you, we’d be thrilled to hear from you and explore how you might contribute to our journey. Role Overview The AI Companion team creates the speech interface for NEO, as well as the physical awareness behaviors that evokes trust, warmth, and competence when NEO interacts with people. As a Multimodal Speech Engineer on the AI Companion Team, you will lead the effort to create a conversational speech model, from design to data collection to deployment. You will develop real-time architectures that enable NEO to not only converse with users, but also incorporate other modalities like vision, spatial audio, and body language. You will work closely with the design team to reflect NEO’s personality and 1X’s brand values in the way NEO speaks and responds to users, and the autonomy team to ensure that NEO’s speech models are aware of its own physical capabilities. Responsibilities Design and implement data pipelines for large scale speech interactions from NEO data and external datasets Train speech2speech models to be aware of NEO’s embodiment Design appropriate responses for a variety of user queries Synchronize speech with body language Customize NEO with different personalities 3+ years of experience in speech and audio modeling domains Experience in multi-modal conversational models (language, audio, vision) is a strong plus Ability to take open-ended problems in conversation models, come up with creative solutions, implement proof-of-concepts, and translate those to production. Benefits & Compensation Salary Range: $150,000 - $250,000 Health, dental, and vision insurance 401(k) with company match Paid time off and holidays Equal Opportunity Employer 1X is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, gender, gender identity or expression, sexual orientation, national origin, ancestry, citizenship, age, marital status, medical condition, genetic information, disability, military or veteran status, or any other characteristic protected under applicable federal, state, or local law.

Responsibilities

The Multimodal Speech Engineer will lead the creation of a conversational speech model for NEO, including design, data collection, and deployment. Responsibilities include developing real-time architectures and synchronizing speech with body language.