Playbook

RAG Systems

Retrieval that stays reliable as your docs change.

A working RAG system is more than embeddings. Treat ingestion, retrieval, and answer assembly as separate, testable layers.

Ingestion + chunking

Preserve structure and provenance.

  • Normalize sources (PDF, DOCX, HTML) into a stable schema
  • Chunk by headings or semantic boundaries, not fixed length
  • Store metadata for ownership, access, and timestamps
  • Version documents so you can roll back or compare

Retrieval strategy

Get the right context before generation.

  • Hybrid search (BM25 + vector) beats pure embeddings
  • Use metadata filters and access control in retrieval
  • Rerank top results with a lightweight model
  • Cache frequent queries and keep a freshness window

Answer assembly

Citations are not optional in production.

  • Prompt with explicit citation requirements
  • Refuse when sources are missing or low confidence
  • Use a strict answer schema to avoid drift

Failure modes

  • Stale or missing docs leading to hallucinations
  • Overfetching irrelevant context
  • Conflicting sources without disambiguation
  • No visibility into retrieval quality

Checklist

  • Test set that covers top queries and edge cases
  • Retrieval quality dashboard
  • Citation enforcement in prompts
  • Access filters at retrieval time