Start Date
Immediate
Expiry Date
25 Oct, 25
Salary
0.0
Posted On
25 Jul, 25
Experience
3 year(s) or above
Remote Job
Yes
Telecommute
Yes
Sponsor Visa
No
Skills
Good communication skills
Industry
Information Technology/IT
WHAT WE’RE LOOKING FOR
Education: Bachelor’s or Master’s in Computer Science, Electrical Engineering, or a related field.
Experience: 3+ years in software engineering focused on real-time systems, multimedia processing, or AI integration.
Programming Skills: Proficiency in Python, C++, or JavaScript, with experience using frameworks like TensorFlow or PyTorch.
Multimedia Knowledge: Familiarity with WebRTC, streaming protocols, and audio/video processing.
AI Integration: Experience deploying ML models in production—especially in vision, speech, or NLP domains.
Analytical Thinking: Strong debugging and problem-solving skills in complex systems.
THE ROLE:
As a Conversational Video Interface Engineer, you’ll be at the forefront of building and optimizing our CVI platform—real-time, multimodal AI systems that bring digital avatars to life. You’ll collaborate across engineering, research, and design teams to integrate vision, speech, and emotional intelligence into seamless, human-like conversations.
WHAT YOU’LL DO
Develop & Optimize CVI Components: Build core systems for WebRTC/video conferencing, speech recognition (ASR), text-to-speech (TTS), vision processing, and replica video output.
Integrate Multimodal AI Models: Work with in-house models like Phoenix-3 (avatar rendering), Sparrow-0 (conversational pacing), and Raven-0 (visual perception) to create responsive digital twins.
Ensure Real-Time Performance: Optimize the CVI pipeline to maintain sub-600ms utterance-to-utterance latency.
Collaborate Cross-Functionally: Partner with AI researchers, product managers, and UX designers to align development with user needs.
Enhance API Infrastructure: Build and maintain scalable APIs for external developers to integrate CVI into their platforms.