AI Research Scientist – Datadog AI Research (DAIR) at DataDog

New York, NY 10018, USA -

Full Time

Start Date

Immediate

Expiry Date

27 Nov, 25

Salary

400000.0

Posted On

27 Aug, 25

Experience

0 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Good communication skills

Industry

Information Technology/IT

Description

As a research scientist on our team, you will partner with research engineers, working on fundamental research problems and collaborating with Datadog’s Product and Engineering teams to help translate research advances into tangible benefits for our customers.
Building on our proven track record of AI-powered solutions (e.g., Bits AI, Watchdog, and Toto), Datadog AI Research is tackling high-risk, high-reward projects grounded in real-world challenges in cloud observability and security.

We are currently focused on three key research areas:

Observability Foundation Models – Building state-of-the-art models for advanced forecasting, anomaly detection, and multi-modal telemetry analysis (logs, metrics, traces, etc.). These models will also provide the foundation for our agents (described below) to natively analyze telemetry data.
Site Reliability Engineering (SRE) Autonomous Agents – Creating AI agents to automatically detect, diagnose, and resolve incidents in production environments, pushing the boundaries of multi-step planning, reasoning, and domain-specific knowledge.
Production Code Repair Agents – Developing agents and models that leverage code, logs, runtime data, and other signals to identify, fix, and even preempt performance issues and security vulnerabilities in production code.

Responsibilities

Conduct cutting-edge research in Generative AI and Machine Learning, aiming to build specialized Foundation Models and AI Agents for observability, site reliability engineering, and code repair
Leverage large-scale distributed training infrastructure to pre-train and post-train state-of-the-art models on diverse, real-world telemetry data
Build simulated environments to facilitate on-policy agentic training and evaluation.
Lead and contribute to research publications, present findings at top-tier conferences (e.g., NeurIPS, ICLR, ICML), and help open-source key model artifacts and benchmarks
Collaborate with cross-functional teams (e.g., Product, Engineering) to integrate advanced AI capabilities – like multi-modal analysis or automated incident resolution planning – into Datadog’s product ecosystem
Stay at the forefront of LLMs, Foundation Models, and Generative AI research and engage with the external research community
Foster a culture of scientific rigor, innovation, and practical impact, e.g., by actively participating in reading groups and mentoring interns