RAG vs long context vs fine-tuning: when to use each

1 min read
ragfine-tuninglong-contextllmraftenterprise-ai
View as Markdown
Originally from vm.tiktok.com
View source

My notes

Watch on TikTok Tap to open video

Summary

Quick 42-second framing of when to use RAG, long-context prompting, and fine-tuning with LLMs. RAG = open-book retrieval with citations, long-context = brute-force more tokens (degrades after a point), fine-tuning = changes style/tone/structure but not knowledge. Pros combine RAG + fine-tuning into RAFT.

Key Insight

  • RAG = open-book exam. Model pulls real company data per query and cites sources. Best when knowledge changes or you need traceability.
  • Long context = simple, looks powerful, but quality degrades past a threshold even on top models. Enterprise databases never fit in 1M tokens anyway, so it’s a stopgap, not a solution.
  • Fine-tuning changes voice/structure, NOT knowledge. Retraining on every data update would bankrupt you, so don’t use fine-tuning to inject facts.
  • RAFT (Retrieval-Augmented Fine-Tuning) is the pro move: fine-tune for behavior + retrieve for knowledge. Decouples the two.
  • Common mistake: people pick fine-tuning when they actually need RAG (because they want the model to “know” their data).