# Crawlee for Python: Unified Web Scraping and Browser Automation

> Crawlee is a production-grade Python library unifying HTTP and headless browser crawling with built-in proxy rotation, retries, and persistent queues.

Published: 2024-01-10
URL: https://daniliants.com/insights/crawlee-python-web-scraping-browser-automation-python/
Tags: web-scraping, python, playwright, crawler, browser-automation, data-extraction, ai-data, proxy-rotation

---

## Summary

Crawlee is a production-grade Python library by Apify that unifies HTTP and headless browser crawling under a single API. It handles proxy rotation, session management, retries, and persistent URL queues out of the box, making it viable for scraping JavaScript-heavy sites without custom boilerplate. Designed for feeding AI pipelines - LLMs, RAG systems, and vector stores - with structured, reliable data.

## Key Insight

- **Two crawler modes in one lib**: `BeautifulSoupCrawler` for fast HTML parsing (no JS), `PlaywrightCrawler` for JS-heavy pages - swap without restructuring your code
- **Anti-bot by default**: proxy rotation + session management built in; crawlers appear human-like with zero extra config
- **State persistence**: if a crawl is interrupted, it resumes from where it left off - no wasted compute on large jobs
- **Asyncio-native**: built on Python's standard async library, plays well with modern async stacks (FastAPI, aiohttp, etc.)
- **uv-first**: official CLI (`uvx 'crawlee[cli]' create my-crawler`) uses uv for project scaffolding - already the recommended toolchain
- **Modular extras**: install only what you need (`crawlee[beautifulsoup]`, `crawlee[playwright]`, `crawlee[all]`) - keeps environments lean
- **Cloud-ready**: designed to deploy on Apify platform, but runs standalone anywhere
- **Typed codebase**: full type hints mean IDE autocompletion works properly and static analysis catches bugs early