Regarding web pages that cannot be imported as knowledge

avakero · December 4, 2025, 2:14pm

I can’t scrape the client’s website. Is there a good solution?

avakero · December 4, 2025, 2:26pm

danny_support · December 5, 2025, 12:26am

Hi @avakero ,
Thank you for reaching out. I took a closer look at the issue you are seeing with importing https://kuyou-qa.com/ into your knowledge base.

When adding the homepage URL directly, the importer returns a “failed to fetch” error. This typically happens when the root page is slow to respond or times out during the initial single-page fetch request.

However, when I tested one of the subpages instead, the import completed successfully. For example, using:

https://kuyou-qa.com/compare-funeral-types-and-services/

allowed the importer to scrape that page and also retrieve the homepage as part of the crawl.

Sometimes if the homepage scrape fails, starting from a subpage still enables the knowledge base to gather the main site’s content.

Please try importing one of the subpages rather than the homepage URL. If you continue to run into issues, please let us know.

avakero · December 5, 2025, 12:47am

Hi Danny,

Thank you for the detailed investigation.

The page I’d like pickaxe to learn is this one: (https://kuyou-qa.com/all-qa/)

I created a simple page based on your advice and added a link to it, but unfortunately it seems to be timing out and failing. (Knowledge base for AI - 葬Qナビ/葬儀・供養・終活の総合Q&Aサイト)

Many articles are added to this homepage every day. Do you have any advice on the best way to manage the knowledge base, including future maintenance?

According to the client, they’re anticipating around 20,000 articles over the next two years. While it’s quite surprising, is it realistically possible to handle this volume? (Is there actually a way to manage this…?)

avakero · December 7, 2025, 7:29am

Here’s a status update.

When I enter the URL in column 1, the title in column 2, and the content in column 3 in the spreadsheet, it now responds correctly. (It didn’t work when columns 1 and 2 were reversed.)

I’ve decided to give up on scraping URLs from large-scale websites.

Thank you for your help.

danny_support · December 9, 2025, 6:42am

Hi @avakero,
Thank you for the follow-up message. I want to clarify a few things, especially in case my earlier explanation caused confusion.

When I suggested starting from a subpage rather than the homepage, I meant only that a slow or timing-out homepage can sometimes block the initial fetch, while a fast-loading subpage may allow the scraper to begin its crawl. In my case, adding:

https://kuyou-qa.com/compare-funeral-types-and-services/

successfully allowed the importer to collect many pages across the site, including /all-qa/ and the homepage.

From your latest message, it looks like you are now experimenting with managing content using a spreadsheet. This approach can work well if you paste the actual content of the subpages into the spreadsheet, because Pickaxe will read the text directly. However, if the spreadsheet relies on the URLs themselves to fetch page content, it will not bypass the same loading and timeout limitations that affect web scraping.

Topic		Replies	Views
Website Scraper only returns 1 page of my website 😢 Bugs / Site Issues	5	139	December 20, 2024
Scraping website for knowledge base doesn't work General	9	211	January 11, 2025
Knowledge Base web scraper keeps inventing fake URLs Bugs / Site Issues knowledge-base	3	101	August 14, 2024
Recursive Setting for Web Knowledgebase General	14	163	October 6, 2025
Scrap webpage feature not working or broken Bugs / Site Issues	1	75	November 18, 2024

Regarding web pages that cannot be imported as knowledge

Related topics