GPU Network Software Engineer at Advanced Micro Devices Inc
Santa Clara, CA 95054, USA -
Full Time


Start Date

Immediate

Expiry Date

26 Jul, 25

Salary

0.0

Posted On

27 Apr, 25

Experience

0 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Documentation, Debugging, Version Control, Optimization, C++, System Software, Software Development, Hip, C, Cuda, Testing

Industry

Computer Software/Engineering

Description

PREFERRED EXPERIENCE:

  • Strong background developing system software in C/C++
  • Experience with at least one of the following:


    • Implementing communication middleware like MPI/SHMEM

    • Implementing lower-level communication frameworks like UCX and libfabric, or development using RDMA APIs
    • Development and optimization of communication collective algorithms (e.g. AllReduce)
    • Familiarity with GPU programming in HIP or CUDA
    • In-depth knowledge of the best practices in software development, including testing, profiling, debugging, documentation, version control, issue tracking, and planning
    • Proven track record contributing to open-source projects
    Responsibilities

    WHAT YOU DO AT AMD CHANGES EVERYTHING

    We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences – the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world’s most important challenges. We strive for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives.
    AMD together we advance_
    Responsibilities:

    THE ROLE:

    As a GPU network software engineer you will design, implement, and test networking features in communication libraries, middleware, and frameworks to provide best in class support for GPU applications running high performance computing and machine learning workloads at scale. You will work with technical experts within AMD, our partners, and the open-source community to implement these features as part of AMD’s Radeon Open Ecosystem (ROCm).

    KEY RESPONSIBILITIES:

    • Design, implement, and test features to enhance GPU support in communication libraries, middleware and frameworks
    • Benchmark, profile and optimize code to maximize performance of multi-node GPU applications
    • Deliver high-quality code and documentation following best practices for open-source software development
    • Work with key technical experts across AMD and with our partners and customers to improve ROCm applications, libraries, and tools
    Loading...