Researcher – Vision Team at Sarvam AI

Bengaluru, karnataka, India -

Full Time

Start Date

Immediate

Expiry Date

16 Jun, 26

Salary

0.0

Posted On

18 Mar, 26

Experience

2 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Vision-Language Architectures, Training Methods, Data Strategies, Evaluation Frameworks, Model Failure Modes, PyTorch, Generative AI, VLM Development, Multilingual VLMs, RLHF, DPO, Experimental Design, Open-Source Contributions

Industry

Software Development

Description

Company Overview Sarvam.ai is a pioneering generative AI startup headquartered in Bengaluru, India. Our mission is to make generative AI accessible and impactful for Bharat. Founded by a team of AI experts, we are building cost-effective, high-performance AI systems tailored for the Indian market — enabling enterprises to deploy speech, language, and vision models at scale. Join us to build the foundational vision models that power the next generation of AI systems for India and beyond. Job Summary As a Researcher, you will work across the full lifecycle of VLM development — data, training, evaluation, and getting models into production. The team's scope will evolve as the field does; we want researchers who are comfortable with that and can lead. Key Responsibilities * Research vision-language architectures - encoders, fusion mechanisms, pretraining objectives, scaling behaviour * Design training methods (Pretraining, SFT, RLHF, DPO) adapted for multilingual VLMs * Investigate data strategies - what mixtures, quality signals, and synthetic data approaches actually move the needle * Build evaluation frameworks and benchmarks, especially for Indic multimodal tasks. * Study model failure modes, robustness, and interpretability * Work closely with engineers to ensure ideas are testable at scale - prototype fast, then validate properly * Engage with the broader research community through open-source contributions and collaborations Must-Have Skills * Deep understanding of vision-language models - training dynamics, architecture tradeoffs, failure modes * Track record of good research - whether through publications, technical reports, or impactful shipped work * Rigorous experimental design — you isolate variables and draw defensible conclusions * Strong PyTorch skills — you run your own experiments end to end * Intellectual range: willing to work across data, training, and evaluation problems Good to Have * PhD/Master’s with relevant research experience in ML, Computer Vision, NLP, or related field * Research papers published at A/A* venues * Experience with multilingual or low-resource language modelling * Familiarity with document understanding, OCR, or structured visual prediction * Experience with large-scale data curation and its effect on model quality Location Bengaluru, India (Hybrid)

Responsibilities

The Researcher will be involved across the entire lifecycle of Vision-Language Model (VLM) development, including data handling, training, evaluation, and deployment into production. Key tasks involve researching vision-language architectures, designing adapted training methods, investigating data strategies, and building evaluation frameworks.