I have a client that wants to add 60+ pdfs to a pickaxe knowledge base. Is it best for the pdfs to be combined into one big pdf or them to be added individually? ![]()
@uapllc Short answer: Add them individually, not as one mega-PDF.
Adding individualy wins because
- Better recall & relevance >> The model can anchor responses to a specific doc (“Source: Pricing-Guide-2024.pdf”) instead of a giant omnibus.
- Easier maintenance >> You can replace/update just one PDF when it changes -no re-uploading a 300-page bundle.
- Cleaner metadata >> Per-file titles, tags, and descriptions improve retrieval and let you exclude/include specific docs per tool.
- Faster indexing & fewer failures >> One bad page won’t force you to reprocess everything.
- Access control >> You can attach only the relevant subset to each Pickaxe/tool.
When to combine
- If you have many micro-files (e.g., 60 one-pagers in a single series), combine into logical packs (e.g., 5 -10 PDFs by topic). That reduces clutter while keeping updates manageable.
Prep checklist (saves time + boosts accuracy)
- Name clearly:
Category – Title – vYYYY-MM.pdf. - Trim noise: remove repeated headers/footers and scanned artifacts; ensure OCR is clean.
- Add a cover/title page with a one-paragraph summary and keywords.
- Group by topic (if you do packs): one topic per file.
- Test a few uploads first to confirm the KB surfaces them well before batching the rest.
How to load 60+ quickly
- Use the Studio UI for drag-and-drop in batches, or
- Use automation (Make/Zapier/n8n) with the Pickaxe “Create Document” action / Studio API to bulk load and tag.
-Ned
To help reduce token costs for frequent tasks, this is something that I do a lot for clients. I will combine clusters of PDFs into one combined PDF. I then feed that PDF into a custom app that removes the clutter and reduces that large document into a list of core insights. The app is trained to be extremely thorough and make sure it gathers every key insight and valuable piece of information from the PDF combo document. I then put those insights into a single text document and use that text doc as a datasource for my apps. So far it has done an amazing job and reduced overall token cost for my clients..