Bring Back and Improve Scheduled Scraping for Knowledge Files (with Webhook Support)

Summary:
Reintroduce and upgrade the ability to schedule website scraping for Knowledge Files. Enhance it with better control, visibility, and webhook support—making it a reliable, flexible system for keeping AI agents synced with live external data.


The Problem:
Today, scraping a URL into a Knowledge File is a one-time event. If the source changes, the data goes stale and must be manually refreshed. That’s unsustainable for creators using Pickaxe with:

  • Dynamic web content (blogs, SOPs, live docs)

  • AI agents that depend on accuracy

  • Workflows that rely on data staying current

The old scheduling feature helped—but it disappeared. Now we’re asking not just for its return, but for it to be rebuilt right.


Requested Features:

1. Scheduling Options (Per URL):

When uploading or managing a website Knowledge File:

  • Add scrape frequency: Manual, Daily, Weekly, Monthly

  • Allow creators to set time of day for scraping

  • Optionally retain or overwrite previous content (with version tagging)

2. Webhook Trigger Support (New):

After a successful or failed scrape, Pickaxe should offer an outbound webhook option:

  • Trigger a custom URL (e.g., n8n, Zapier, Make, or internal system)

  • Include payload: file ID, timestamp, scrape status, and diff summary if applicable

  • Supports automation like:

    • Logging to a Notion dashboard

    • Sending a Slack message

    • Re-running an agent or regenerating a report

Example Use Case:
Scrape site at 6am → webhook hits n8n → n8n notifies Slack + updates a Google Sheet + pings an AI workflow.

3. After-Scrape Diff Reporting (Optional but Strongly Recommended):

Generate and optionally send to Studio Owner/Controller:

  • How many chunks were added/changed/removed

  • Diff summary in plain text

  • Timestamp + link to updated Knowledge File

4. Scrape Health Monitoring:

  • Log scrape attempts with status (success, fail, skipped)

  • Retry logic (e.g., 3x backoff)

  • Email or in-app alert if a scrape fails repeatedly


Why This Matters:
This feature turns static data into living knowledge, essential for agents supporting real-time or frequently updated domains. Adding webhook support also unlocks serious integration power for automation-minded users—without forcing them into brittle scraping workarounds outside Pickaxe.


Final Thought:
Bring scraping back—but make it programmable, transparent, and reliable. Knowledge is only useful when it’s current. Let us keep it that way, on our own terms.

Hey @taedog2020, here’s more insight into why the scheduled scraping was rolled back in V2:

That’s why I said add webhook. have n8n or something to do the scrape and update the KB.

@taedog2020 You can do that now. Just add a webhook or connect an MCP server and configure your n8n or Make scenario. It’s already possible within Pickaxe.

so it will dynamically update the KB in the studio?

That depends on your scenario. You can set up a scenario with a proxied scraper and set the refresh intervals that will trigger a website scrape.

For example: