ML Engineer - Voice and Speech to Speech Models at Apna

Bengaluru, karnataka, India -

Full Time

Start Date

Immediate

Expiry Date

06 May, 26

Salary

0.0

Posted On

05 Feb, 26

Experience

2 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

ML Engineering, Voice Models, Speech Models, LLM Fine-Tuning, TTS Systems, STT Systems, Generative Audio, GANs, Diffusion Models, PyTorch, TensorFlow, HuggingFace, Model Deployment, Data Cleaning, Inference Pipelines, Multimodal Interfaces

Industry

technology;Information and Internet

Description

About Apna: Apna is India’s largest jobs and professional networking platform and one of India’s fastest unicorns. As we expand Bluemachines AI, our AI-first platform across voice, text, and multimodal workflows — we’re looking for a bold and curious Applied ML Scientist who wants to shape the future of applied Gen AI. Requirement: 1 Location: Bengaluru (Work from Office - Domlur) Team: AI & Machine Learning Experience: 2–7 years What You'll do: Fine-tune and deploy LLMs, TTS, STT, and voice models for use in real-time conversations with millions of users. Convert unstructured, messy real-world audio/text data into clean, high-quality datasets for training and evaluation. Build inference pipelines optimized for low-latency, high-accuracy voice agents and multimodal interfaces. Work closely with infra and product teams to ship production-grade GenAI models with observability, fallback, and monitoring. Experiment with GANs, diffusion models, audio generation, and multimodal fusion to power next-gen AI agents. Own the full model lifecycle — from research and training to deployment, testing, and iteration. What we're Looking for: 2-7 years of hands-on experience in AI / ML roles, ideally in startups or product-driven teams. Strong grasp of LLM fine-tuning, instruction tuning, or pretraining techniques. Familiarity with TTS/STT systems, Whisper, Tacotron, VITS, or other open source models . Experience with multimodal architectures, generative audio, GANs, or diffusion-based models. Ability to work with real-world messy data, design training pipelines, and debug model failure modes. Fluency in frameworks like PyTorch, HuggingFace, TensorFlow, and ecosystem tools (ONNX, Triton, LangChain, etc.). Passion for building high-impact AI features that ship to real customers. Why Join Us: Work at the cutting edge of LLMs, voice AI, and generative models — and ship real products, not just prototypes. Directly impact millions of users by powering AI agents that help with hiring, learning, and career growth. Collaborate with a world-class team of AI engineers, researchers, and product minds who move fast and ship boldly. Freedom to explore: Own experiments, propose architecture, or contribute to foundational model training. Startup speed, enterprise scale — best of both worlds. Rapid iteration and direct customer feedback. Multilingual India - first problems that push the boundaries of speech, reasoning, and personalization.

Responsibilities

The role involves fine-tuning and deploying various voice and speech models for real-time conversations. Additionally, the engineer will convert unstructured data into clean datasets and build optimized inference pipelines.