# video-use: edit videos with Claude Code via transcripts

> Video-use is a Claude Code skill that turns raw footage into edited cuts via word-level transcripts plus on-demand visual filmstrips, at a fraction of the token cost.

Published: 2026-04-17
URL: https://daniliants.com/insights/github---browser-usevideo-use/
Tags: video-editing, claude-code, ai-agents, ffmpeg, automation, open-source, content-creation

---

## Summary

Video-use is an open-source Claude Code skill that turns raw video footage into edited final cuts through conversation. Instead of frame-dumping (which would cost ~45M tokens), it reads video through word-level audio transcripts (~12KB) plus on-demand visual filmstrips, achieving production-quality cuts at a fraction of the token cost.

## Key Insight

- The core architectural insight is treating video the same way browser-use treats web pages: give the LLM a structured representation (transcript + selective filmstrips) instead of raw pixels. This reduces 30,000 frames x 1,500 tokens to ~12KB text + a handful of PNGs.
- Uses ElevenLabs Scribe for word-level timestamps and speaker diarization, enabling word-boundary-precise cuts.
- Automatic production polish: filler word removal (umm, uh, false starts), dead space trimming, 30ms audio fades at cuts, auto color grading, and burned-in subtitles.
- Self-evaluation loop runs `timeline_view` on rendered output at every cut boundary, catching visual jumps and audio pops before showing the user. Max 3 fix-and-re-render cycles.
- Generates animation overlays via Manim or Remotion, spawning parallel sub-agents per animation.
- Session memory persists in `project.md` so editing sessions can span multiple days.
- Installed as a Claude Code skill via symlink to `~/.claude/skills/video-use`.