Caveman: Claude Code skill cuts output tokens 65% via caveman-speak

1 min read
claude-codetoken-optimizationllmprompt-compressioncoding-agentscavemanbrevity
View as Markdown
Originally from github.com
View source

My notes

Summary

Caveman is a Claude Code skill/plugin that instructs the agent to respond in caveman-speak (dropping articles, filler, hedging) to cut output tokens by roughly 65% on average while preserving technical accuracy. Ships with multi-agent install (Claude Code, Codex, Gemini, Cursor, Windsurf, Copilot, Cline), mode switching (lite/full/ultra/wenyan), and a compress tool that rewrites memory files to cut input tokens by about 46% every session.

Key Insight

  • Real benchmark numbers from the Claude API, not marketing. A 10-task eval averages 1,214 to 294 tokens (65% saved). Individual tasks range 22% to 87%. Best savings on explanatory prose (React re-render bug: 87%), worst on already-dense architecture discussions (22%).
  • Only output tokens get compressed, reasoning/thinking tokens untouched. So the win is readability, speed, and output cost, not raw “make model cheaper to think.”
  • Caveman-Compress is the sleeper feature. It rewrites memory files into caveman-speak, keeps a human-readable backup, and saves around 46% on every session-start context load. Code blocks, URLs, paths, commands, dates, headings pass through untouched, only prose compresses.
  • Three-arm eval methodology (verbose vs skill vs plain-terse) specifically designed to rule out the “you are just comparing against verbose Claude” objection. Worth stealing as a pattern when benchmarking any prompt technique.
  • Cited paper backs the angle, a March 2026 arXiv paper “Brevity Constraints Reverse Performance Hierarchies in Language Models” found brief-response constraints improved accuracy by 26 percentage points on certain benchmarks. Verbose does not equal correct.
  • Wenyan mode compresses in Classical Chinese, arguably the most token-dense written language ever. Novelty, but also a real lever if token budget is the constraint.
  • The always-on snippet is portable, works in any agent’s system prompt without the plugin infrastructure:

    “Terse like caveman. Technical substance exact. Only fluff die. Drop: articles, filler (just/really/basically), pleasantries, hedging. Fragments OK. Short synonyms. Code unchanged.”

  • Auto-activation only works where the agent has hooks (Claude Code SessionStart, Gemini context files, Codex .codex/hooks.json). Cursor/Windsurf/Cline/Copilot need manual rule-file additions.