Graphify: Knowledge Graph Skill for AI Coding Assistants
1 min read
Originally from graphify.net
View source
My notes
Summary
Graphify is an open-source (MIT) skill that builds a multi-modal knowledge graph from code, docs, papers and diagrams so AI coding assistants can understand large codebases without sending raw source to the model. It combines Tree-sitter static analysis with LLM-driven semantic extraction, builds a NetworkX graph, and uses Leiden community detection, no vector embeddings required. Reports 71.5x token reduction on a mixed Karpathy corpus (~1.7k tokens per query vs ~123k naive).
Key Insight
- Graphs beat vector RAG for code understanding. Code has explicit structure (ASTs, call graphs, imports) that vectors flatten away. Graphify keeps that structure intact and layers semantic labels on top.
- No embeddings, no vector store. Leiden community detection runs on graph topology alone, sidestepping the whole embedding-model + vector-DB stack most RAG setups need.
- Privacy-conscious by default. Raw source never leaves the local machine. Only semantic descriptions (docstrings, concepts) go to the LLM, and it uses the model key the assistant already has configured.
- Token compression at scale stays linear. On a ~500k-word corpus, BFS subgraph queries stay around 2k tokens vs ~670k naive. That’s the real value prop, query cost doesn’t explode with codebase size.
- “God nodes” and “surprise edges” are the concrete analytical outputs: highest-degree nodes in the graph (architectural keystones) plus unexpected cross-file/cross-domain connections (design smells or hidden coupling worth investigating).
- Multi-modal means it reads diagrams too. Vision models extract concepts from images and PDFs, so architecture diagrams and research papers get merged into the same graph as the code.
- Distribution note: PyPI package is
graphifyy(double y), CLI isgraphify. Easy to fat-finger. - Assistant integration is via slash commands (
/graphify,/graphify query,/graphify path,/graphify explain) with manifests for Claude Code, OpenAI Codex, OpenCode. Any assistant that can run shell commands can invoke it. - Outputs are portable artifacts:
graph.html,graph.json,GRAPH_REPORT.md. The graph persists, gets cached, and can be regenerated incrementally. - 3.7k+ GitHub stars and MIT license signal real traction, commercially usable, with dependencies (NetworkX BSD, Tree-sitter MIT) all permissive.