AI Product Engineer (LLM Agents & SRE Automation) (f/m/x) at ilert GmbH

Cologne, North Rhine-Westphalia, Germany -

Full Time

Start Date

Immediate

Expiry Date

27 Feb, 26

Salary

0.0

Posted On

29 Nov, 25

Experience

2 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Ai-Powered Applications, Llm Agents, Prompt Engineering, Agent Design, Multi-Step Tool-Use Flows, Software Engineering, Api Integration, Reliability, Safety, Controlled Automation, Product Mindset, Sre, Devops, Incident Response, Observability Tools, Kubernetes, Production Agent Frameworks

Industry

Software Development

Description

Team: Product & Engineering • Reports to the CTO Location: Hybrid - Cologne (Rheinauhafen) - 3 days in office, 2 days remote (Tue and Thu) Shape the future of autonomous incident response We’re on a mission to make downtime invisible. Thousands of DevOps and SRE teams rely on ilert to detect, resolve, and communicate incidents faster. As our first AI Product Engineer, you’ll build the core of ilert’s AI-first strategy: autonomous, tool-using agents that diagnose alerts, run root cause analysis, execute safe mitigations, and keep services healthy. This is a hands-on role where you’ll turn operational expertise and product insight into real, reliable AI systems used in production. Tasks Design & Build AI Agents Design agent reasoning loops, prompts, and safety constraints. Build multi-step tool-using agents (logs, metrics, traces, k8s, Git, CI/CD, cloud APIs). Implement autonomy flows: investigation → analysis → mitigation → validation. Ship Product Features Work with product and engineering to build AI-backed features that solve real customer problems. Translate complex SRE workflows into intuitive user experiences powered by AI. Own features end-to-end (design → prototype → implementation → rollout). Integrate with Observability & Ops Tooling Connect LLM agents to Grafana, Prometheus, Kubernetes, GitHub, CI/CD, cloud services, etc. Design safe tool schemas and APIs for autonomous execution. Ensure Reliability, Safety & Determinism Build guardrails for safe, reversible mitigations. Validate model output with structured schemas (e.g., Zod, JSON schema). Establish evaluation suites, test harnesses, and monitoring for agent performance. Collaborate Across Teams Work with SREs to encode operational expertise into agents. Partner with Product to shape requirements and roadmap decisions. Influence ilert’s broader AI strategy. Requirements Must-Have Skills Experience building AI-powered applications with LLMs (OpenAI, Anthropic, etc.) Strong prompt engineering & agent design skills Experience implementing multi-step tool-use flows Solid software engineering fundamentals (preferably Rust) Experience integrating with APIs, backend services, or automations Ability to reason about reliability, safety, and controlled automation Product mindset: able to turn ambiguous problems into shippable solutions Nice-to-Have Skills Background in SRE, DevOps, or incident response Experience with observability tools (Grafana, Prometheus, Elastic, Datadog, New Relic) Hands-on Kubernetes knowledge Experience with production agent frameworks (ReAct, LangChain, LangGraph, custom state machines) Soft Skills You love building real products, not demos Strong communication & critical thinking Comfortable working with high autonomy and ownership Passion for reliability, automation, and removing toil Benefits Build one of the first real autonomous SRE agents in the industry Product-centric culture: Be part of a team that's 100% committed to solving a critical issue for businesses that offer round-the-clock services. Hybrid Work Environment: Enjoy the best of both worlds with in-person collaboration and remote work flexibility. No Meetings #hackfwd: Maximize productivity by keeping meetings to a minimum and focusing on your core responsibilities. High impact, high ownership role. Your work ships to customers quickly Small, senior team with fast decision-making Modern tech stack + strong engineering culture Direct involvement in shaping the future of on-call and incident responseFounder-led startup Please include one link (GitHub, repo, notebook, or demo) that best showcases your experience building AI-powered or agentic systems.

Responsibilities

As the first AI Product Engineer, you will design and build AI agents that autonomously diagnose alerts and execute safe mitigations. You will also work closely with product and engineering teams to develop AI-backed features that enhance user experience.