Member of Technical Staff, Image Generation at Mirage

New York, New York, USA -

Full Time

Start Date

Immediate

Expiry Date

04 Dec, 25

Salary

300000.0

Posted On

04 Sep, 25

Experience

0 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Machine Learning, Testing, Analytical Skills, Computer Science, Triton, Rapid Prototyping, Code Review, Optimization Techniques, Cuda, Experimental Design, Prototype

Industry

Information Technology/IT

Description

Mirage is redefining short-form video with frontier AI research.
We’re building full-stack foundation models and products that are changing the future of this format and video creation, production and editing more broadly. Over 20 million creators and businesses use Mirage’s products to reach their full creative and commercial potential.
We are a rapidly growing team of ambitious, experienced, and devoted engineers, researchers, designers, marketers, and operators based in NYC. As an early member of our team, you’ll have an opportunity to have an outsized impact on our products and our company’s culture.

PREFERRED QUALIFICATIONS

Research Experience

Master’s or PhD in Computer Science, Machine Learning, or a related field or equivalent practical experience.
Demonstrated experience implementing and improving state‑of‑the‑art generative image models.
Deep expertise in generative modeling approaches (flow matching / diffusion, autoregressive models, VAEs, GANs, etc.).
Strong background in optimization techniques, sampling, and loss‑function design.
Experience with empirical scaling studies and systematic architecture research.
Track record of research contributions at top ML conferences (NeurIPS, CVPR, ICCV, ICML, ICLR).

Technical Expertise

Strong proficiency in modern deep‑learning tooling (PyTorch, CUDA, Triton, FSDP, etc.).
Experience training image diffusion models with billions of parameters.
Familiarity with large language models or multimodal transformers is a plus.
Deep understanding of attention, transformers, latent representations, and modern image‑text alignment techniques.
Expertise in distributed training systems, model parallelism, and high‑throughput inference.
Proven ability to implement and improve complex model architectures end to end.

Engineering Capabilities

Ability to write clean, modular research code that scales from prototype to production.
Strong software‑engineering practices including testing, code review, and CI/CD.
Experience with rapid prototyping and experimental design under tight iteration loops.
Strong analytical skills for debugging model behavior, numerical stability, and performance bottlenecks.
Familiarity with profiling and optimization tools (Nsight, TensorBoard, PyTorch Profiler, etc.).
Track record of bringing research ideas to production and maintaining high code quality in a research environment.

How To Apply:

Incase you would like to apply to this job directly from the source, please click here

Responsibilities

ABOUT THE ROLE

Captions is seeking an exceptional Member of Technical Staff to advance the state‑of‑the‑art in large‑scale image generation models. You’ll conduct novel research on generative image models for people and storytelling, developing new training techniques and scaling models to billions of parameters and millions of users. As a key member of our AI team, you’ll work at the cutting edge of image generation systems that enable natural, expressive, and high‑fidelity outputs.
Our team has strong expertise in training large‑scale models with demonstrated research and product impact (see our recent whitepaper here for details of recent work). We’re especially excited to push the use of image synthesis for multimodal video generation, with a focus on photorealistic quality, professional-grade expressivity, and creative iteration. Our models power tools used by millions of creators, and we’re tackling fundamental challenges in how to generate compelling composition, lighting, and fine‑grained detail across diverse domains.

KEY RESPONSIBILITIES

Research & Architecture Development

Design and implement large‑scale image generation models (transformers, latent diffusion, flow matching, etc.).
Develop new approaches to multimodal conditioning and generation (e.g. audio and video) and controllability (editing, multi-frame consistency, script guidance, etc).
Research advanced image‑editing and -generation techniques such as content‑preserving edits, multi‑input conditioning, and reference‑based generation.
Establish and validate scaling laws for image diffusion models across resolution and parameter count.
Develop automated evaluation approaches for improved fidelity and consistency.
Drive rapid experimentation with model architectures, sampling strategies, and training strategies.
Validate research directly through product deployment and real user feedback.
Derive insights from data and recommend architectures and training practices that will make meaningful impacts on our products.

Model Training & Optimization

Train and optimize models at massive scale (10s–100s of billions of parameters) across multi‑node GPU clusters.
Push the boundaries of efficiency and hardware utilization for training and deploying models in a cost effective manner.
Develop sophisticated distributed training approaches using FSDP, DeepSpeed, Megatron‑LM, Triton and custom CUDA kernels where needed.
Design and implement model‑compression techniques (pruning, distillation, quantization, etc.) for efficient serving.
Create new approaches to memory optimization, gradient checkpointing, and mixed‑precision training.
Research techniques for improving sampling speed (DDIM, PFGM++, SDE‑VE) and training stability at scale.
Conduct systematic empirical studies to benchmark architecture and optimization choices.