Graphify: Knowledge Graph Skill for AI Coding Assistants

1 min read
knowledge-graphclaude-codetree-sittercode-intelligencerag-alternativenetworkxleiden-algorithm
View as Markdown
Originally from graphify.net
View source

My notes

Summary

Graphify is an open-source (MIT) skill that builds a multi-modal knowledge graph from code, docs, papers and diagrams so AI coding assistants can understand large codebases without sending raw source to the model. It combines Tree-sitter static analysis with LLM-driven semantic extraction, builds a NetworkX graph, and uses Leiden community detection, no vector embeddings required. Reports 71.5x token reduction on a mixed Karpathy corpus (~1.7k tokens per query vs ~123k naive).

Key Insight

  • Graphs beat vector RAG for code understanding. Code has explicit structure (ASTs, call graphs, imports) that vectors flatten away. Graphify keeps that structure intact and layers semantic labels on top.
  • No embeddings, no vector store. Leiden community detection runs on graph topology alone, sidestepping the whole embedding-model + vector-DB stack most RAG setups need.
  • Privacy-conscious by default. Raw source never leaves the local machine. Only semantic descriptions (docstrings, concepts) go to the LLM, and it uses the model key the assistant already has configured.
  • Token compression at scale stays linear. On a ~500k-word corpus, BFS subgraph queries stay around 2k tokens vs ~670k naive. That’s the real value prop, query cost doesn’t explode with codebase size.
  • “God nodes” and “surprise edges” are the concrete analytical outputs: highest-degree nodes in the graph (architectural keystones) plus unexpected cross-file/cross-domain connections (design smells or hidden coupling worth investigating).
  • Multi-modal means it reads diagrams too. Vision models extract concepts from images and PDFs, so architecture diagrams and research papers get merged into the same graph as the code.
  • Distribution note: PyPI package is graphifyy (double y), CLI is graphify. Easy to fat-finger.
  • Assistant integration is via slash commands (/graphify, /graphify query, /graphify path, /graphify explain) with manifests for Claude Code, OpenAI Codex, OpenCode. Any assistant that can run shell commands can invoke it.
  • Outputs are portable artifacts: graph.html, graph.json, GRAPH_REPORT.md. The graph persists, gets cached, and can be regenerated incrementally.
  • 3.7k+ GitHub stars and MIT license signal real traction, commercially usable, with dependencies (NetworkX BSD, Tree-sitter MIT) all permissive.