KB File Organization

I have a client that wants to add 60+ pdfs to a pickaxe knowledge base. Is it best for the pdfs to be combined into one big pdf or them to be added individually? :thinking:

2 Likes

@uapllc Short answer: Add them individually, not as one mega-PDF.

Adding individualy wins because

  • Better recall & relevance >> The model can anchor responses to a specific doc (“Source: Pricing-Guide-2024.pdf”) instead of a giant omnibus.
  • Easier maintenance >> You can replace/update just one PDF when it changes -no re-uploading a 300-page bundle.
  • Cleaner metadata >> Per-file titles, tags, and descriptions improve retrieval and let you exclude/include specific docs per tool.
  • Faster indexing & fewer failures >> One bad page won’t force you to reprocess everything.
  • Access control >> You can attach only the relevant subset to each Pickaxe/tool.

When to combine

  • If you have many micro-files (e.g., 60 one-pagers in a single series), combine into logical packs (e.g., 5 -10 PDFs by topic). That reduces clutter while keeping updates manageable.

Prep checklist (saves time + boosts accuracy)

  • Name clearly: Category – Title – vYYYY-MM.pdf.
  • Trim noise: remove repeated headers/footers and scanned artifacts; ensure OCR is clean.
  • Add a cover/title page with a one-paragraph summary and keywords.
  • Group by topic (if you do packs): one topic per file.
  • Test a few uploads first to confirm the KB surfaces them well before batching the rest.

How to load 60+ quickly

  • Use the Studio UI for drag-and-drop in batches, or
  • Use automation (Make/Zapier/n8n) with the Pickaxe “Create Document” action / Studio API to bulk load and tag.

-Ned

4 Likes

To help reduce token costs for frequent tasks, this is something that I do a lot for clients. I will combine clusters of PDFs into one combined PDF. I then feed that PDF into a custom app that removes the clutter and reduces that large document into a list of core insights. The app is trained to be extremely thorough and make sure it gathers every key insight and valuable piece of information from the PDF combo document. I then put those insights into a single text document and use that text doc as a datasource for my apps. So far it has done an amazing job and reduced overall token cost for my clients..

1 Like