Member of Technical Staff, Frontiers of Deep Learning Scaling at XAI LONDON LTD

Palo Alto, California, United States -

Full Time

Start Date

Immediate

Expiry Date

03 Apr, 26

Salary

440000.0

Posted On

03 Jan, 26

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Python, JAX, PyTorch, Rust, Model-Hardware Co-Design, ML Scaling, Distributed Training, Multi-GPU Training, Training Efficiency, Data Cleaning, Self-Improvement, Continual Learning, Model Architectures, Attention Mechanisms, Non-Autoregressive Models, Training Stability

Industry

technology;Information and Internet

Description

About xAI xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All engineers are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates. About the Role Pretrain team at xAI aims to answer the question: How to scale up intelligence by scaling up compute effectively? This question can be further broken down into two sub-questions: What to scale up How to scale up What to scale up: Next-token prediction is a meaningful target for the time where online data is large enough, but model size can not grow as much. As we enter the new phase, model size is growing faster than data, therefore we need a new scaling paradigm. At xAI, our compute grows much faster than other companies. We believe scaling up effective compute / useful data is the best path to achieve next-level intelligence. What is “effective compute” or “useful data”? This is the first question this role is expected to explore and answer. It could be solid data cleaning and scaling, could be discovering new knowledge via self-improvement, could be a new learning paradigm like continual learning, could be unified models of text / code / images / videos understanding and generation, could be new model architectures / attention / non-autoregressive models... Anything that has the potential to be the next scaling paradigm is open to exploration. How to scale up: Remember we are aiming at several hundreds of millions GPU hours of training, any tiny training stability issue will ruin the big run. So this role also needs to explore how to do large-scale and long-time training. For example, most reasoning and postraining phases

How To Apply:

Incase you would like to apply to this job directly from the source, please click here

Responsibilities

The role involves exploring how to scale up intelligence by effectively scaling compute and data. It requires hands-on work in building evaluations, preparing data, implementing ideas, and analyzing results.