This is a magical new feature from Cloudflare that will reduce server load on our websites and token usage on our pickaxe tools.
Here is how it works: The scraping bot can request a markdown version of a webpage rather than an HTML version. This means no nav, no javascript, no ads, no noise. All that useless HTML costs tokens to ignore and hosting resources to serve. With this new system the server serves just the content. For a website like mine with bot crawling all over it, this could be a big hosting cost saver.
A blog post that requires more than 16,000 tokens in full HTML form drops to just over 3,000 tokens in markdown.
Not only this but it increases the signal to noise ratio. Those extra 13,000 html tokens are noise the LLM must parse to find the signal.
The next step is to add support for the markdown handshake to the Pickaxe scraper so I can get the token discount in my pickaxes. Over the next few months more and more websites will add this feature (even if they don’t use Cloudflare) just to save money.