Crawl entire websites with a single API call using Browser Rendering

Source

Summary

Cloudflare launched a new /crawl endpoint in open beta that lets you crawl an entire website with a single API call, returning content as HTML, Markdown, or structured JSON. It runs asynchronously, respects robots.txt by default, and works on both Free and Paid Workers plans — making it a turnkey alternative to self-hosted crawling infrastructure.

Key Insight

  • Unlike most crawling tools, this is a managed, serverless crawler that handles browser rendering (headless Chrome), page discovery (sitemaps + link following), and output formatting in one API call. No infrastructure to manage.
  • Incremental crawling via modifiedSince and maxAge parameters means you can run repeated crawls without re-fetching unchanged pages — critical cost saver for monitoring or RAG pipeline refresh workflows.
  • The render: false static mode is a smart addition: skip the browser overhead for static sites, dramatically reducing crawl time and cost.
  • Structured JSON output powered by Workers AI is the standout feature — it means you get LLM-ready content extraction without a separate parsing step. This collapses what was typically a 3-tool pipeline (crawl -> render -> extract) into one call.
  • It self-identifies as a bot and cannot bypass Cloudflare bot detection or captchas — this is not a scraping tool for adversarial use cases, but a legitimate data acquisition tool.
  • Crawl scope controls (depth, page limits, URL wildcard patterns) give fine-grained control over what gets crawled, preventing runaway jobs.