video-use: edit videos with Claude Code via transcripts

1 min read
video-editingclaude-codeai-agentsffmpegautomationopen-sourcecontent-creation
View as Markdown
Originally from github.com
View source

My notes

Summary

Video-use is an open-source Claude Code skill that turns raw video footage into edited final cuts through conversation. Instead of frame-dumping (which would cost ~45M tokens), it reads video through word-level audio transcripts (~12KB) plus on-demand visual filmstrips, achieving production-quality cuts at a fraction of the token cost.

Key Insight

  • The core architectural insight is treating video the same way browser-use treats web pages: give the LLM a structured representation (transcript + selective filmstrips) instead of raw pixels. This reduces 30,000 frames x 1,500 tokens to ~12KB text + a handful of PNGs.
  • Uses ElevenLabs Scribe for word-level timestamps and speaker diarization, enabling word-boundary-precise cuts.
  • Automatic production polish: filler word removal (umm, uh, false starts), dead space trimming, 30ms audio fades at cuts, auto color grading, and burned-in subtitles.
  • Self-evaluation loop runs timeline_view on rendered output at every cut boundary, catching visual jumps and audio pops before showing the user. Max 3 fix-and-re-render cycles.
  • Generates animation overlays via Manim or Remotion, spawning parallel sub-agents per animation.
  • Session memory persists in project.md so editing sessions can span multiple days.
  • Installed as a Claude Code skill via symlink to ~/.claude/skills/video-use.