Senior Scientist, Synthetic Data and Privacy at NVIDIA

, , United Kingdom -

Full Time

Start Date

Immediate

Expiry Date

17 Jul, 26

Salary

0.0

Posted On

18 Apr, 26

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Synthetic data generation, Data privacy, Differential privacy, LLM, PyTorch, HuggingFace Transformers, PEFT, LoRA, vLLM, TGI, Software engineering, Machine learning, NER, PII detection, Federated learning, CI/CD

Industry

Computer Hardware Manufacturing

Description

NVIDIA is at the forefront of the AI revolution, and our research is shaping the future of large language models. We are looking for a Senior Scientist to join our team and help advance our capabilities in generating synthetic datasets and privacy-preserving AI. You will contribute to open-source libraries within the NVIDIA NeMo ecosystem that enable high-quality synthetic data generation at scale while ensuring data privacy. This role combines hands-on software engineering with research in privacy-enhancing methods, and you will collaborate with research, engineering, product teams, and external labs. What you'll be doing: Build and implement advanced pipelines for generating synthetic datasets using innovative LLM-based methodologies and automated quality evaluation frameworks. Research and implement privacy-preserving techniques such as differentially private training (DP-SGD), identifying and replacing sensitive information via NER models, and membership inference protection. Design and maintain open-source software libraries and SDKs with clean APIs and developer-facing documentation, applying robust software design patterns. Drive software excellence through modern development tooling, architecture managed by configurations, and professional Git/CI-CD workflows. Publish original research at top machine learning and AI conferences to maintain NVIDIA's technical leadership. Mentor interns and junior researchers to develop technical growth within the team. What we need to see: PhD in Computer Science, Machine Learning, Statistics, or a related field, or equivalent experience. A research background of 5+ years in synthetic data generation, data privacy, or related areas such as differential privacy, federated learning, or trustworthy machine learning is required. Comparable experience is also considered. Proven track record of developing or maintaining software libraries used by a broad developer community. Deep technical understanding of PyTorch and the HuggingFace Transformers ecosystem including PEFT and LoRA. Technical familiarity with LLM inference frameworks such as vLLM or TGI. Strong publication record at premier venues such as NeurIPS, ICML, ICLR, ACL or similar. Ways to stand out from the crowd: Active contributions to open-source projects, particularly in ML, security, or privacy domains. Specialized expertise with differential privacy concepts and tools such as Opacus. Ability to build and optimize scalable data processing pipelines for large-scale models. Proficiency with NER-based PII detection and advanced anonymization techniques. Functional knowledge of global privacy regulations such as GDPR or CCPA. NVIDIA is widely considered to be one of the technology world's most desirable employers. We have some of the most forward-thinking and talented people in the world working with us. If you are creative, autonomous, and passionate about building open-source tools that make AI safer and more private, we want to hear from you. NVIDIA is the world leader in accelerated computing. NVIDIA pioneered accelerated computing to tackle challenges no one else can solve. Our work in AI and digital twins is transforming the world's largest industries and profoundly impacting society. Learn more about NVIDIA.

Responsibilities

Develop and implement advanced pipelines for synthetic data generation and privacy-preserving AI within the NVIDIA NeMo ecosystem. Collaborate with research and engineering teams to publish original research and maintain high-quality open-source software libraries.