video-use: edit videos with Claude Code via transcripts

Curated April 17, 2026 1 min read

video-editingclaude-codeai-agentsffmpegautomationopen-sourcecontent-creation

My notes

Summary

Video-use is an open-source Claude Code skill that turns raw video footage into edited final cuts through conversation. Instead of frame-dumping (which would cost ~45M tokens), it reads video through word-level audio transcripts (~12KB) plus on-demand visual filmstrips, achieving production-quality cuts at a fraction of the token cost.

Key Insight

The core architectural insight is treating video the same way browser-use treats web pages: give the LLM a structured representation (transcript + selective filmstrips) instead of raw pixels. This reduces 30,000 frames x 1,500 tokens to ~12KB text + a handful of PNGs.
Uses ElevenLabs Scribe for word-level timestamps and speaker diarization, enabling word-boundary-precise cuts.
Automatic production polish: filler word removal (umm, uh, false starts), dead space trimming, 30ms audio fades at cuts, auto color grading, and burned-in subtitles.
Self-evaluation loop runs timeline_view on rendered output at every cut boundary, catching visual jumps and audio pops before showing the user. Max 3 fix-and-re-render cycles.
Generates animation overlays via Manim or Remotion, spawning parallel sub-agents per animation.
Session memory persists in project.md so editing sessions can span multiple days.
Installed as a Claude Code skill via symlink to ~/.claude/skills/video-use.