7 levels of Claude Code and RAG
3 min read
Originally from vm.tiktok.com
View source
My notes
Watch on TikTok Tap to open video
Summary
A 7-level maturity model for giving Claude Code durable memory, from the built-in auto-memory files all the way up to agentic RAG with multimodal ingestion. The core argument: most people should stop at level 4 (Obsidian vault with index hierarchy) because it handles thousands of documents without the cost and fragility of real RAG, and context rot makes a bloated CLAUDE.md actively harmful.
Key Insight
The 7 levels
- L1 auto-memory.
.claude/projects/.../memory/files Claude writes on its own. Cute, low signal, often shoehorns irrelevant trivia into answers. Trap: users stay here and cope by never clearing the session, bloating the 1M context window (effectiveness drops from 92% at 256k to 78% by the time it is full). - L2 CLAUDE.md. First “real” memory. Trap: a bloated rule book. Cited study (evaluatingagents.md line of research) shows AGENTS.md/CLAUDE.md files can reduce LLM effectiveness because they are injected into every prompt. Rule: if it is not relevant to virtually every prompt, it should not live in CLAUDE.md.
- L3 multi-file state. Break memory into
project.md,requirements.md,roadmap.md,state.mdetc., with CLAUDE.md as the index that points to them. Tools like GSD (Get Shit Done) and superpowers-style orchestrators already do this. Crude chunking, but scales further than a single file. - L4 Obsidian vault. Karpathy’s LLM knowledge base pattern: vault root, raw/ folder for ingestion staging, wiki/ folder with master index + per-topic subfolders + article markdown. Claude Code traverses vault to wiki to index to article via grep. The claim: this is the 80% (really 99%) solution for solo operators, free, no infra, handles thousands of docs.
- L5 naive RAG. Chunks to embeddings to vector DB to similarity search at query time. Understand how it works so you can evaluate vendors - if a vendor sells “RAG” with no re-ranker, no graph, it is basically a dressed-up ctrl-F with ~25% accuracy. Data point cited: textual RAG vs textual LLM in a 2025 study showed ~1200x cheaper and faster for correct answers (pre-Claude-Code, older models, gap has narrowed but not disappeared).
- L6 GraphRAG / LightRAG. Entities + relationships, not siloed vectors. LightRAG GitHub benchmarks show jumps of often >100% over naive RAG across comprehensiveness, diversity, empowerment (e.g. 31.6 to 68.4, 24 to 76). Obsidian backlinks look like GraphRAG visually but are manual/arbitrary - LightRAG relationships are model-extracted from content.
- L7 agentic + multimodal RAG. RAG-Anything + LightRAG combo for scanned PDFs / images / tables. Gemini Embedding 2 (March 2026) handles video embeddings directly, not just transcripts. At this level most of the infrastructure is data ingestion/sync pipelines (n8n example shown) and a top-of-funnel agent that routes between graph RAG, Postgres SQL, and other stores. Trap: forcing yourself here when Obsidian would do.
Non-obvious takeaways
- The 1M context window did not solve the memory problem - it created a new one. People now never clear sessions, effectiveness degrades linearly, and token burn explains much of the “Claude got nerfed” complaints.
- Context rot is why CLAUDE.md is a double-edged sword: the very fact that it is injected into every prompt is what makes it harmful when contents do not apply to every prompt.
- Before jumping to RAG, ask: do you actually need relationship queries (how does X interact with Y across separate docs), or just lookup? Lookup = Obsidian is enough. Relationship = you need graph.
- Chunking artefacts are a silent killer in naive RAG. If chunk 3 references chunk 1 and only chunk 3 gets retrieved, the answer is nonsense. This is why re-rankers and graph structures exist.
- Ingestion, dedup, and update flow are the 80% of production RAG that demos ignore. Route documents through Google Drive or similar first, not directly into the vector DB.