LLM Diagnostics and Intelligent Runbooks

First PublishedMar 23, 2026ByAtif Alam

Job descriptions often ask for LLM-based diagnostics and intelligent runbooks. This page maps those phrases to workflows you can describe and prototype.

By the end of this page, you should be able to sketch an LLM-assisted triage flow and list what to verify before automation touches production.

AI assistants for SRE workflows

Tools such as Cursor, Continue.dev, or Aider help author:

runbooks and Terraform,
scripts for one-off remediation,
queries for metrics and logs.

Minimum bar: Evaluate LLM-generated code for correctness, security (secrets, overly broad IAM), and edge cases—not just whether it “runs once.”

Intelligent runbooks

A static runbook is a fixed checklist. An intelligent runbook workflow:

Ingests the symptom (alert text, ticket, short description).
Retrieves relevant runbook sections and recent changes (deploys, config).
Suggests next steps and links to dashboards or queries.

You may not have built one end-to-end; you should still be able to design the architecture: retrieval, policy gates, and human approval. That usually leads to RAG—see RAG for Incident Operations.

LLM-based diagnostics

Common pattern: send log excerpts, stack traces, or trace IDs to an LLM with strict instructions to:

summarize what failed,
propose hypotheses ranked by likelihood,
suggest queries to confirm or deny.

Retrieval-Augmented Generation (RAG) improves grounding by pulling similar past incidents and runbooks into the prompt instead of relying on the model’s parametric memory alone.

Safety and governance

Default to read-only suggestions until reviewed.
Log prompts and outputs for audit on production-impacting paths.
Use Evaluating LLM Outputs for acceptance criteria.

Helm, operators, and GitOps — delivery context for where automation runs
GitOps — declarative desired state vs LLM suggestions

LLM Diagnostics and Intelligent Runbooks

AI assistants for SRE workflows

Intelligent runbooks

LLM-based diagnostics

Safety and governance

Related reading