Senior Deep Learning Scientist, Speech Synthesis at NVIDIA

Hà Nội, , Vietnam -

Full Time

Start Date

Immediate

Expiry Date

21 Jul, 26

Salary

0.0

Posted On

22 Apr, 26

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Python, Machine Learning, Deep Learning, Speech Synthesis, PyTorch, CNNs, RNNs, LSTMs, Transformers, DSP, Feature extraction, Git, Gerrit, GitLab, TTS, Voice cloning

Industry

Computer Hardware Manufacturing

Description

NVIDIA is a global leader in AI, high-performance computing, and visualization, with GPU technology powering everything from modern computers to robots and autonomous systems. As a pioneer in AI computing, NVIDIA is shaping the future of conversational AI. NVIDIA is looking for Speech Data Scientists to develop the high‑impact, high‑visibility Speech AI product Riva and improve the experience of millions of customers. If you're creative and passionate about solving real‑world conversational AI problems, come join our Riva Product Engineering team. More details: https://developer.nvidia.com/riva What you'll be doing: Train Speech Synthesis mel-spectrogram and vocoder models. Measure, benchmark, and analyze model performance, accuracy, and bias; recommend improvements. Maintain the TTS model evaluation system and characterize quality metrics across platforms. Improve processes for speech data processing, augmentation, filtering, and TTS training set preparation. Build knowledge of TTS datasets for training and evaluation. Collaborate with cross-functional teams on new features, improvements, and issue triage. Participate in code reviews, design reviews, use case reviews, and test plan reviews. What we need to see: Master’s degree (or equivalent experience) or PhD in Computer Science, Electrical Engineering, AI, Applied Math, Linguistics, or Computational Linguistics. 5+ years of experience in machine learning and AI model development. Strong Python programming skills, with solid fundamentals in software design and optimization. Strong knowledge of ML/DL techniques and tools, including CNNs, RNNs/LSTMs, and Transformers. Hands-on experience training speech synthesis models, including TTS, voice cloning, or speech-to-speech systems. Proficiency with PyTorch and familiarity with DSP and feature extraction techniques (FFT, MFCC, Mel spectrograms). Experience with Git, Gerrit, or GitLab, and strong collaboration skills. Ways to stand out from the crowd: Experience with multilingual or code-switched TTS, voice cloning, or cross-lingual voice cloning. Familiarity with text normalization, inverse text normalization, and multilingual G2P systems. Interest in linguistics, phonetics, phonology, and language technologies. Strong C++ programming skills and familiarity with CUDA, cuDNN, or TensorRT. Experience deploying ML models on data center, cloud, or embedded systems. NVIDIA is committed to fostering a diverse work environment and is proud to be an equal opportunity employer. We do not discriminate based on race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status, or any other characteristic protected by law. Widely considered to be one of the technology world’s most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package. As you plan your future, see what we can offer to you and your family www.nvidiabenefits.com/ NVIDIA is the world leader in accelerated computing. NVIDIA pioneered accelerated computing to tackle challenges no one else can solve. Our work in AI and digital twins is transforming the world's largest industries and profoundly impacting society. Learn more about NVIDIA.

Responsibilities

The role involves training and benchmarking speech synthesis models, including mel-spectrograms and vocoders, to improve product performance. You will also collaborate with cross-functional teams to refine speech data processing pipelines and evaluate model quality metrics.