Flipbook: an infinite visual browser generated by image models

1 min read
generative-uiimage-modelsweb-experimentsinfinite-browsermodalsouth-park-commons
View as Markdown
Originally from flipbook.page
View source

My notes

Summary

Flipbook is an experimental infinite “visual browser” where every page is an image generated on demand by an image model, including all text rendered as pixels, no HTML or text overlays. Click anything inside the image to dive deeper; content is grounded by an agentic web search plus the model’s own world knowledge. Compute is sponsored by Modal, with backing from South Park Commons.

Key Insight

  • Wild bet on generative UI: replace HTML+CSS+JS with a single image stream. No DOM, no links, no fields, just pixels.
  • Text-as-pixels is the controversial design choice. It means no copy-paste, no a11y, no SEO, no Cmd+F. The team accepts this as a current-models tradeoff that “will get better.”
  • Factual accuracy bar set explicitly to ChatGPT/Gemini/Claude level. They admit hallucination risk and don’t claim deterministic correctness.
  • Optional “live video stream” mode animates static pages and creates seamless transitions. Currently stitches a custom video model on top of the image generator; goal is a single unified model.
  • Backed by Modal (compute sponsor) and South Park Commons. Signals that compute cost is the bottleneck, since generating every screen on demand is far more expensive than rendering HTML.
  • The pitch frames text UIs as “sipping an ocean of wisdom through a tiny straw.” Real product thesis: when models can render any visual at near-zero marginal cost, code-based UI becomes overkill for many tasks.
  • Hinted future direction: stateful pages that take actions and store data, eating into the territory currently held by apps and websites (e.g. trip research plus booking in the same surface).