Skip to content

60-Day AIOps Learning Plan

First PublishedByAtif Alam

By the end of 60 days, you should have hands-on artifacts (sample RAG app, anomaly notebook, eval notes) and a one-pager you could discuss in an internal review or with stakeholders.

Focus: Build intuition for what LLMs do well and where they fail.

  • Use Cursor, Continue.dev, Aider, or Claude / GPT-4 for:
    • drafting runbooks,
    • parsing log snippets,
    • drafting post-mortem sections.
  • Deliverable: 3 short runbook drafts + a “failure log” (where the model was wrong and why).

Focus: Move past UI tours to how signals become anomalies or correlated incidents.

  • Pick one: Datadog AI features, AWS DevOps Guru, or similar.
  • Deliverable: Notes on inputs (metrics/logs), outputs (stories, insights), and limitations.

Weeks 5–6: RAG conceptually (and lightly in code)

Section titled “Weeks 5–6: RAG conceptually (and lightly in code)”

Focus: Chunking, embeddings, retrieval, grounding.

  • Skim LangChain or LlamaIndex docs for mental model—even a tiny prototype counts.
  • Deliverable: Diagram + minimal RAG prototype (see AIOps Tooling and Stack).

Weeks 7–8: Strategy and stakeholder readiness

Section titled “Weeks 7–8: Strategy and stakeholder readiness”

Focus: Connect your past operational pain to AI interventions.

  • Draft a mock “AI Strategy for SRE” document:
    • toil map → intervention,
    • 30/60/90 day rollout,
    • risks and metrics.
  • Deliverable: 2-page doc you can present.

Aligned with AIOps Tooling and Stack:

  1. LangChain + ChromaDB + Anthropic — RAG over runbooks.
  2. Prophet or isolation forest — sample metrics anomalies.
  3. Instructor + Anthropic — structured extraction from logs.
  4. Ragas — evaluate your RAG pipeline.

Deep model training, full MLOps platforms, and heavy math—unless your role explicitly requires it. This plan targets consuming and deploying AI for operations.