Start Date
Immediate
Expiry Date
17 Jul, 25
Salary
0.0
Posted On
14 May, 25
Experience
0 year(s) or above
Remote Job
Yes
Telecommute
Yes
Sponsor Visa
No
Skills
Software Development, Embedded Systems, Ml, Software Engineers, Rdma, Deep Learning, Computer Science
Industry
Information Technology/IT
In this role, you will be a member of the Network AI Software team and part of the bigger DC networking organization. The team develops and owns the software stack around collective communication libraries around Meta.At the high level, the team aims to enable Meta-wide ML products and innovations to leverage our large-scale training and inference fleet through an observable, reliable and high-performance distributed AI communication stack. Currently, one of the team’s focus is on building customized features, SW benchmarks, performance tuners and SW stacks around PyTorch to improve the full-stack distributed ML reliability and performance (e.g. Large-Scale GenAI/LLM training) from the trainer down to the network communication layer. And we are seeking for leaders to work on the space of GenAI/LLM scaling reliability and performance.
MINIMUM QUALIFICATIONS:
PREFERRED QUALIFICATIONS: