Raw PDFs are one of the fastest ways to burn money—turning 4,500 words of actual content into 100k+ tokens of headers, footers, fonts, and binary noise. But sometimes that noise is needed if the design matters.
The fix is dead simple: Give us a toggle called “Convert user uploaded files to Markdown” that we can set on a pickaxe by pickaxe basis.
When turned on, Pickaxe would automatically strip the junk and hand the model clean, efficient markdown instead. That one change could cut document-related tokens by 10–20x for most Pickaxes with zero extra work from the user.
There are already battle-tested open-source packages that do this reliably.
Make it optional though.
Some workflows need the original layout and images. My my authors uploading manuscripts, markdown is perfect. But for my web-page scanner that reads PDF screenshots, a markdown conversion would break the tool. A simple checkbox keeps full control for those cases.
This feels like classic Pickaxe: one small setting that delivers massive, automatic wins for new users while letting power users decide exactly how they want to run.