7 levels of Claude Code and RAG

Curated April 16, 2026 3 min read

claude-coderagobsidianknowledge-basellmcontext-managementgraphragagentic-rag

My notes

Summary

A 7-level maturity model for giving Claude Code durable memory, from the built-in auto-memory files all the way up to agentic RAG with multimodal ingestion. The core argument: most people should stop at level 4 (Obsidian vault with index hierarchy) because it handles thousands of documents without the cost and fragility of real RAG, and context rot makes a bloated CLAUDE.md actively harmful.

Key Insight

The 7 levels

L1 auto-memory. .claude/projects/.../memory/ files Claude writes on its own. Cute, low signal, often shoehorns irrelevant trivia into answers. Trap: users stay here and cope by never clearing the session, bloating the 1M context window (effectiveness drops from 92% at 256k to 78% by the time it is full).
L2 CLAUDE.md. First “real” memory. Trap: a bloated rule book. Cited study (evaluatingagents.md line of research) shows AGENTS.md/CLAUDE.md files can reduce LLM effectiveness because they are injected into every prompt. Rule: if it is not relevant to virtually every prompt, it should not live in CLAUDE.md.
L3 multi-file state. Break memory into project.md, requirements.md, roadmap.md, state.md etc., with CLAUDE.md as the index that points to them. Tools like GSD (Get Shit Done) and superpowers-style orchestrators already do this. Crude chunking, but scales further than a single file.
L4 Obsidian vault. Karpathy’s LLM knowledge base pattern: vault root, raw/ folder for ingestion staging, wiki/ folder with master index + per-topic subfolders + article markdown. Claude Code traverses vault to wiki to index to article via grep. The claim: this is the 80% (really 99%) solution for solo operators, free, no infra, handles thousands of docs.
L5 naive RAG. Chunks to embeddings to vector DB to similarity search at query time. Understand how it works so you can evaluate vendors - if a vendor sells “RAG” with no re-ranker, no graph, it is basically a dressed-up ctrl-F with ~25% accuracy. Data point cited: textual RAG vs textual LLM in a 2025 study showed ~1200x cheaper and faster for correct answers (pre-Claude-Code, older models, gap has narrowed but not disappeared).
L6 GraphRAG / LightRAG. Entities + relationships, not siloed vectors. LightRAG GitHub benchmarks show jumps of often >100% over naive RAG across comprehensiveness, diversity, empowerment (e.g. 31.6 to 68.4, 24 to 76). Obsidian backlinks look like GraphRAG visually but are manual/arbitrary - LightRAG relationships are model-extracted from content.
L7 agentic + multimodal RAG. RAG-Anything + LightRAG combo for scanned PDFs / images / tables. Gemini Embedding 2 (March 2026) handles video embeddings directly, not just transcripts. At this level most of the infrastructure is data ingestion/sync pipelines (n8n example shown) and a top-of-funnel agent that routes between graph RAG, Postgres SQL, and other stores. Trap: forcing yourself here when Obsidian would do.

Non-obvious takeaways

The 1M context window did not solve the memory problem - it created a new one. People now never clear sessions, effectiveness degrades linearly, and token burn explains much of the “Claude got nerfed” complaints.
Context rot is why CLAUDE.md is a double-edged sword: the very fact that it is injected into every prompt is what makes it harmful when contents do not apply to every prompt.
Before jumping to RAG, ask: do you actually need relationship queries (how does X interact with Y across separate docs), or just lookup? Lookup = Obsidian is enough. Relationship = you need graph.
Chunking artefacts are a silent killer in naive RAG. If chunk 3 references chunk 1 and only chunk 3 gets retrieved, the answer is nonsense. This is why re-rankers and graph structures exist.
Ingestion, dedup, and update flow are the 80% of production RAG that demos ignore. Route documents through Google Drive or similar first, not directly into the vector DB.