Start Date
Immediate
Expiry Date
16 Nov, 25
Salary
251000.0
Posted On
16 Aug, 25
Experience
0 year(s) or above
Remote Job
Yes
Telecommute
Yes
Sponsor Visa
No
Skills
Deep Learning, Performance Management, Rdma, Software Engineers, Computer Science, Machine Learning, Software Development, Embedded Systems, Mpi
Industry
Computer Software/Engineering
NETWORK ENGINEERING
In this role, you will be a member of the Host Side Collective Communication Libraries (HCCL) software team and part of the Host Side Networking (HSN) organization within the DC Networking and Network Infra organizations. The team develops and owns the software stack around collective communication libraries for the internally developed Meta AI accelerator, Meta Training and Inference Accelerator (MTIA).At the high level, the team’s charter is to enable and performance optimize HCCL, our internally developed communication library for the MTIA. Currently, one of the team’s focus areas is standing up our first communications stack targeting various AI training workloads. The team is building the communications software stack including hardware customized features, software benchmarks and performance tooling.Specifically we are looking for people to lead designing, developing and operating some of the largest AI infrastructure in the world. This is a rare opportunity to work with the leading AI experts in the industry and build cutting edge AI communications infrastructure.
MINIMUM QUALIFICATIONS
PREFERRED QUALIFICATIONS