Cloudflare Just Launched the /crawl Endpoint – Crawl Entire Websites with ONE API Call (Open Beta)
Cloudflare dropped a game-changer yesterday (March 10, 2026): the new Browser Rendering /crawl endpoint.
Unlike Markdown for Agents, this Cloudflare feature is free and enabled automatically meaning most Pickaxe users have Cloudflare and this feature already works on their website. It’s just a matter of adding it to the
This is huge for anyone building AI agents, RAG systems, knowledge bases, or data ingestion tools inside Pickaxe. You no longer need to roll your own crawler, manage Puppeteer queues, or fight anti-bot measures — just one API call and you get clean, fully rendered content from an entire site (or any section of it).
What the /crawl Endpoint Actually Does
Submit a starting URL and Cloudflare automatically:
- Discovers pages via sitemaps (including deeply nested sitemaps like YOAST SEO generates on WordPress), internal links, or both
- Renders each page in a real headless browser (JavaScript fully executed) or fast static mode
- Returns the content in the exact format you want, yes including markdown.
Jobs are asynchronous (fire the request → get a job ID → poll until done). Results are stored for 14 days.
Why This Is Perfect for Pickaxe
- Instant RAG ingestion — Pull clean Markdown or structured JSON from docs sites, blogs, product catalogs, or client websites.
- LLM-ready output — Native Markdown support means fewer tokens and better agent performance.
- Structured data extraction — Use Workers AI to pull exactly what you need (products, FAQs, pricing, etc.) with a prompt + JSON schema.
- Incremental & smart — Only re-crawl changed pages on repeat runs.
- Well-behaved bot — Fully respects
robots.txt(including Crawl-delay and Sitemap directives). No angry site owners. - Handles real-world sites perfectly — depth up to 100,000 links, wildcards for include/exclude, subdomains, external links, custom headers, auth, etc.
Simple Example (Markdown + JSON crawl)
curl -X POST 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl' \
-H 'Authorization: Bearer <YOUR_TOKEN>' \
-H 'Content-Type: application/json' \
-d '{
"url": "https://example.com/docs",
"formats": ["markdown", "json"],
"limit": 200,
"depth": 5,
"source": "all",
"render": true
}'
You instantly get back a job_id. Then poll it:
curl -X GET 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl/{job_id}' \
-H 'Authorization: Bearer <YOUR_TOKEN>'
When done, the response contains an array of records with your chosen formats + metadata (title, status, final URL, etc.).
Key Features & Controls
- formats:
["html"],["markdown"],["json"], or any combo - source:
"all"(default),"sitemaps", or"links" - limit / depth — full control (default limit 10, depth up to 100k)
- render: false — super-fast static HTML (no browser cost during beta)
- options.includePatterns / excludePatterns — wildcard targeting (e.g.
["**/docs/**"]) - jsonOptions — AI-powered extraction (prompt + schema)
- modifiedSince / maxAge — incremental crawling
- Block images/fonts/stylesheets, custom User-Agent, auth, headers, waitForSelector, etc.
Full reference (with every parameter explained):
Official announcement:
Limits (Open Beta)
- Workers Free plan: 5 crawl jobs per day, max 100 pages per crawl (+ 10 min browser time/day)
- Workers Paid plan: Much higher limits (billed by browser hours used — ~$0.09/hr beyond included allowance)
Quick Note for Website Owners
If you run a site, nothing to turn on. The crawler is polite and honors robots.txt. You can block it via WAF if you want, but most people are leaving it open.
This feels like it could become a core primitive for Pickaxe agents and data sources. Cleaner, cheaper, and more reliable than anything we’ve had before.