Universal CLAUDE.md to Reduce Claude Output Tokens by ~63%

1 min read
claude-codetoken-optimizationclaude-mdprompt-engineeringllm-cost-reductionai-toolingdeveloper-experience
Originally from github.com
View source

My notes

Summary

A drop-in CLAUDE.md file that targets Claude’s default verbose behaviors (sycophantic openers, restated questions, unsolicited suggestions, em dashes) to reduce output token count by roughly 63% across typical prompts. The savings are directional rather than statistically rigorous, and the file itself adds input tokens on every message, so net benefit only materializes in high-output-volume workflows like automation pipelines and agent loops.

Key Insight

  • The 63% reduction claim comes from a 5-prompt benchmark with no variance controls or repeated runs - treat it as a directional signal, not a precise measurement. At 100 prompts/day the estimated monthly savings on Sonnet is only ~0,86 USD, scaling linearly to ~25,92 USD across 3 projects at 1 000 prompts/day.
  • The real value is the curated list of 12 specific anti-patterns (sycophantic openers, hollow closings, restating the prompt, em dashes/smart quotes, “As an AI” framing, unnecessary disclaimers, unsolicited suggestions, over-engineered code, hallucination on uncertain facts, ignoring corrections, redundant file reads, scope creep) with concrete fix rules for each. This taxonomy is more useful than the file itself.
  • Key design insight: CLAUDE.md files compose across three levels (global ~/.claude/CLAUDE.md, project-level, subdirectory-level). Keeping general preferences global and project-specific constraints local avoids bloating any single file.
  • The file includes an override mechanism: user instructions always win, so explicitly requesting verbose or detailed output still works.
  • Profiles for different use cases (coding, agents, analysis) acknowledge that compression levels should vary by task type.
  • The honest trade-off disclosure is notable: on single short queries or low-volume casual use, the file is a net token increase because it loads into context on every message.