Deep Learning Solution Architect at NVIDIA

Beijing, Beijing, China -

Full Time

Start Date

Immediate

Expiry Date

03 Jul, 26

Salary

0.0

Posted On

04 Apr, 26

Experience

2 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Large Language Models, RAG workflows, Agentic inference, PyTorch, Hugging Face Transformers, GPU computing, Distributed parallel training, TRT-LLM, Megatron-LM, NVIDIA NeMo, Docker, Kubernetes, Performance tuning, System architecture, Technical communication

Industry

Computer Hardware Manufacturing

Description

NVIDIA are seeking dynamic Solution Architects with specialized expertise in training Large Language Models (LLMs), implementing RAG workflows, and agentic inference. You will leverage the full NVIDIA software & hardware ecosystem to design, optimize, and deliver production-grade generative AI solutions for enterprise customers. With competitive salaries and a generous benefits package, we are widely considered to be one of the world’s most desirable employers! We have some of the most forward-thinking and hardworking people in the world working for us and, due to outstanding growth, our best-in-class engineering teams are rapidly growing. If you're a creative and autonomous person with a real passion for technology, we want to hear from you. What You Will Be Doing: Architect end-to-end solutions focused on LLM pretraining, fine-tuning, high-performance inference, RAG workflows, and agentic inference orchestration using NVIDIA’s hardware and software platforms. Collaborate with customers to understand their LLM-related business challenges and design tailored solutions aligned with the NVIDIA ecosystem. Lead LLM training, distributed optimization, and performance tuning to achieve optimal throughput, latency, and memory efficiency. Design and integrate RAG workflows and agentic inference pipelines into customer systems; provide technical guidance on best practices. Collaborate with NVIDIA engineering teams to provide feedback and support pre-sales technical activities (workshops, demos). What We Need to See: Master’s / Ph.D. in Computer Science, Artificial Intelligence, or equivalent experience. 4+ years hands-on experience in AI, focusing on open-source LLM training, fine-tuning, and production inference optimization. Deep understanding of mainstream LLM architectures and proficiency in LLM customization via PyTorch, Hugging Face Transformers. Solid knowledge of GPU computing, cluster architecture, and distributed parallel training/inference for LLMs. Competency in agentic inference design and using AI agents to solve business challenges. Strong communication skills, able to articulate complex technical concepts to technical and non-technical stakeholders. Ways to Stand Out from the Crowd: Hands-on experience with NVIDIA’s generative AI ecosystem (TRT-LLM, Megatron-LM, NVIDIA NeMo). Advanced skills in LLM optimization (quantization, KV Cache tuning, memory footprint reduction). Experience with Docker, Kubernetes for containerized LLM and agent workflow deployment on-prem. In-depth knowledge of multi-GPU parallelism and large-scale GPU cluster management. #deeplearning NVIDIA is the world leader in accelerated computing. NVIDIA pioneered accelerated computing to tackle challenges no one else can solve. Our work in AI and digital twins is transforming the world's largest industries and profoundly impacting society. Learn more about NVIDIA.

Responsibilities

Architect and deliver end-to-end generative AI solutions including LLM pretraining, fine-tuning, and RAG workflows using NVIDIA hardware and software. Collaborate with enterprise customers to design tailored AI systems and provide technical guidance on performance optimization and deployment.