How do we "Dedupe" Knowledge?

thomasumstattd · November 13, 2025, 9:23pm

Some Background (For Context :): I’ve been stress testing the new RSS import feature. I have 500 blog posts in my RSS feed and I am working to import them into the Knowledge section of a pickaxe.

My web host limits concurrent connection to 100. So for 400 of the imported pages imports a one chunk that says “{message’: ‘Maximum concurrency allowed 100’}.

So I imported the feed again, this time it scrapped more pages successfully, but it still got errors and it scraped some pages again. After a few rounds of this, I have most of my blog posts imported, but I also have a bunch of duplicates.

Question: Is there an automated way to remove duplicates, or to not scrape URLs already in knowledge?

abhi · November 14, 2025, 9:41pm

Hi, there is no automated way to remove duplicates. It could be a nice feature request.

Topic		Replies	Views
Knowledge Base Dedupe Prevention Feature Requests	0	13	November 18, 2025
Recursive Setting for Web Knowledgebase General	14	157	October 6, 2025
Duplicate not copying knowledge - major issue Bugs / Site Issues	4	35	March 15, 2026
Regarding web pages that cannot be imported as knowledge Bugs / Site Issues	5	46	December 9, 2025
Adding a scraped page tries to add all pages! Bugs / Site Issues pickaxe	2	41	March 22, 2025

How do we "Dedupe" Knowledge?

Related topics