ML Engineer at High-growth AI software engineering startup at Jack & Jill/External ATS

London, England, United Kingdom -

Full Time

Start Date

Immediate

Expiry Date

17 May, 26

Salary

0.0

Posted On

16 Feb, 26

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

PyTorch, Distributed Training, FSDP, NCCL Debugging, LLMs, SFT, Reinforcement Learning, PPO, GRPO, Python, Codebases, RLVR Systems, Long-Context Training, Static Analysis, GPU Clusters, DDP

Industry

Staffing and Recruiting

Description

This is a job that Jill, our AI Recruiter, is recruiting for on behalf of one of our customers. She will pick the best candidates from Jack's network. The next step is to speak to Jack. ML Engineer Company Description: High-growth AI software engineering startup Job Description: You will lead large-scale training for enterprise-grade software engineering LLMs, focusing on supervised fine-tuning and reinforcement learning. By scaling training on multi-node GPU clusters, you'll push the boundaries of how models read, modify, and reason about complex codebases. This role bridges the gap between deep learning research and practical, production-ready engineering agents. Location: Remote Why this role is remarkable: Direct impact on the next generation of AI-driven software engineering tools used by developers worldwide. Access to massive compute resources for training ≥70B parameter models using multi-node GPU clusters. Work at the cutting edge of RLVR systems and long-context training within a highly technical team. What you will do: Design and execute large-scale SFT and RL training pipelines to align LLMs with software engineering objectives. Implement and optimize distributed training primitives using PyTorch, FSDP, and custom dataloaders for complex code datasets. Develop novel reward functions and evaluation suites based on real-world code execution, tests, and static analysis. The ideal candidate: Deep proficiency in PyTorch and distributed training at scale (e.g., DDP, FSDP, and NCCL debugging). Proven experience training large language models (≥70B parameters) and implementing RL algorithms like PPO or GRPO. Strong software engineering foundations with the ability to write production-grade Python and debug distributed systems. Who are Jack & Jill? Ok, I'll go first. I'm Jack, an AI that gets to know you on a quick call, learning what you're great at and what you want from your career. Then I help you land your dream job by finding unmissable opportunities as they come up, supporting you with applications, interview prep, and moral support. And I'm Jill, an AI Recruiter who talks to companies to understand who they're looking to hire. Then I recruit from Jack's network, making an introduction when I spot an excellent candidate. Next steps Step 1. Visit our website. Step 2. Click 'Talk to Jack'. Step 3. Talk to Jack so he can understand your experience and ambitions. Step 4. Jack will make sure Jill (the AI agent working for the company) considers you for this role. Step 5. If Jill thinks you're a great fit and her client wants to meet you, they will make the introduction. Step 6. If not, Jack will find you excellent alternatives. All for free. We never post fake jobs This isn't a trick. This is an open role that Jill is currently recruiting for from Jack's network. Sometimes Jill's clients ask her to anonymize their jobs when she advertises them, which means she can't share all the details in the job description. We appreciate this can make them look a bit suspect, but there isn't much we can do about it. Give Jack a spin! You could land this role. If not, most people find him incredibly helpful with their job search, and we're giving his services away for free.

Responsibilities

The role involves leading large-scale training for enterprise-grade software engineering LLMs, focusing on supervised fine-tuning and reinforcement learning by scaling training on multi-node GPU clusters. Responsibilities include designing and executing SFT and RL training pipelines, implementing distributed training primitives, and developing novel reward functions based on code execution and analysis.