Master Principal Cloud Engineer – GPU & AI Infrastructure at Oracle Risk Management Services
Yuexiu District, Guangdong Province, China -
Full Time


Start Date

Immediate

Expiry Date

02 Jun, 26

Salary

0.0

Posted On

04 Mar, 26

Experience

10 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

GPU Specialist, Cloud Engineering, HPC, Artificial Intelligence, NVIDIA Hardware, AMD Hardware, RDMA Networking, Architectural Design, LLMs, Generative AI, RoCE v2, Proof-of-Concept, Triton Inference Server, NeMo Framework, NCCL, DeepSpeed

Industry

IT Services and IT Consulting

Description
Position Overview As a GPU Specialist Cloud Engineer (CE) within the Oracle Cloud Infrastructure (OCI) Pre-Sales organization, you will serve as the primary technical authority for high-performance computing (HPC) and Artificial Intelligence infrastructure. You are not just a generalist; you are the bridge between complex silicon capabilities and transformative business outcomes. You will partner with Enterprise Sales teams to lead the technical discovery, architectural design, and proof-of-concept (PoC) execution for customers building the next generation of Large Language Models (LLMs), generative AI applications, and computationally intensive simulations. This role requires a deep understanding of NVIDIA/AMD hardware stacks, RDMA networking, and the software orchestration layers that make massive-scale GPU clusters hum.   Core Responsibilities 1. Strategic Technical Advisory * Architectural Design: Design end-to-end AI infrastructure solutions on OCI, focusing on Superclusters that leverage NVIDIA H200/B300/GB300 or AMD Instinct™ accelerators. * Optimization: Advise customers on right-sizing GPU shapes based on workload requirements (e.g., training vs. inference, FP8 vs. FP16 precision). * Networking Excellence: Design high-throughput, low-latency interconnect fabrics using RoCE v2 (RDMA over Converged Ethernet) and OCI’s non-blocking leaf-spine architecture. 2. Hands-on Execution & Validation * Proof of Concept (PoC): Lead deep-dive technical evaluations, demonstrating OCI’s superior price-performance ratios for model training and fine-tuning. * Stack Integration: Assist customers in deploying and optimizing the NVIDIA AI Enterprise stack, Triton Inference Server, and NeMo Framework on OCI. * Performance Tuning: Work directly with engineering teams to troubleshoot "bottlenecks"—whether they reside in the kernel, the NCCL (NVIDIA Collective Communications Library) configuration, or the storage IOPS. 3. Thought Leadership & Enablement * Content Creation: Develop whitepapers, reference architectures, and blog posts detailing OCI’s competitive advantages in the AI sovereign cloud and private AI spaces. * Market Intelligence: Stay ahead of the curve on the evolving landscape of AI accelerators, interconnects (InfiniBand vs. Ethernet), and distributed training frameworks (PyTorch, JAX, DeepSpeed). Only Oracle brings together the data, infrastructure, applications, and expertise to power everything from industry innovations to life-saving care. And with AI embedded across our products and services, we help customers turn that promise into a better future for all. Discover your potential at a company leading the way in AI and cloud solutions that impact billions of lives. True innovation starts when everyone is empowered to contribute. That’s why we’re committed to growing a workforce that promotes opportunities for all with competitive benefits that support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs. We’re committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing accommodation-request_mb@oracle.com [accommodation-request_mb@oracle.com] or by calling 1-888-404-2494 in the United States. Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans’ status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.

How To Apply:

Incase you would like to apply to this job directly from the source, please click here

Responsibilities
The role involves serving as the primary technical authority for high-performance computing and AI infrastructure, designing end-to-end AI solutions on OCI leveraging advanced accelerators and high-throughput networking. Responsibilities also include leading technical evaluations, optimizing stack integration for customers, and creating thought leadership content.
Loading...