Member of Technical Staff, Kernels at INCEPTION ARTIFICIAL INTELLIGENCE L.L.C - O.P.C

San Francisco, California, United States -

Full Time

Start Date

Immediate

Expiry Date

08 Jun, 26

Salary

0.0

Posted On

10 Mar, 26

Experience

2 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Cuda, CutE, Triton, Attention, Matrix Multiplication, Normalization, PyTorch, TensorFlow, Performance Optimization, Profiling, Low-precision Formats, XLA, TVM, Distributed Training, Python, C++

Industry

technology;Information and Internet

Description

The Role We're looking for engineers and scientists to design, optimize, and maintain the compute foundations that power large-scale language model training and inference. You will develop high-performance ML kernels, enable efficient low-precision arithmetic, and improve the distributed compute stack that makes training and serving large models possible. Key Responsibilities * Design and implement custom ML kernels (CUDA, CuTe, Triton) for core dLLM operations such as attention, matrix multiplication, gating, and normalization, optimized for modern GPU architectures. * Design compute primitives to reduce memory bandwidth bottlenecks and improve kernel efficiency. * Contribute to infrastructure stability and scalability, ensuring reproducibility, consistency across precision formats, and high utilization of compute resources. Qualifications * BS/MS/PhD in Computer Science, Engineering, or a related field (or equivalent experience). * Proficiency in CUDA, CuTe, Triton, or other GPU programming frameworks. * Understanding of ML frameworks (PyTorch, TensorFlow) from a systems perspective. * Background in performance optimization and profiling of ML systems. * Experience implementing low-precision formats (FP8, INT8, block floating point) or contributing to related compiler stacks (XLA, TVM). * Familiarity with distributed training techniques (data parallel, model parallel, pipeline parallel). * Proficiency in Python and at least one systems programming language (C++/Rust/Go). * Experience with containerization (Docker), orchestration (Kubernetes), and CI/CD pipelines. Preferred Skills * Experience building and maintaining large-scale language models with tens of billions of parameters or more. * Experience with distributed systems and cloud computing platforms (AWS/GCP/Azure). * Familiarity with distributed frameworks such as PyTorch/XLA, DeepSpeed, Megatron-LM. * Prior contributions to open-source deep learning infrastructure such as PyTorch, DeepSpeed, or XLA.

Responsibilities

Engineers will design, optimize, and maintain compute foundations for large-scale language model training and inference, focusing on developing high-performance ML kernels and improving the distributed compute stack. Key tasks include implementing custom ML kernels for core dLLM operations and contributing to infrastructure stability and scalability.