Summary:
Reintroduce and upgrade the ability to schedule website scraping for Knowledge Files. Enhance it with better control, visibility, and webhook support—making it a reliable, flexible system for keeping AI agents synced with live external data.
The Problem:
Today, scraping a URL into a Knowledge File is a one-time event. If the source changes, the data goes stale and must be manually refreshed. That’s unsustainable for creators using Pickaxe with:
-
Dynamic web content (blogs, SOPs, live docs)
-
AI agents that depend on accuracy
-
Workflows that rely on data staying current
The old scheduling feature helped—but it disappeared. Now we’re asking not just for its return, but for it to be rebuilt right.
Requested Features:
1. Scheduling Options (Per URL):
When uploading or managing a website Knowledge File:
-
Add scrape frequency: Manual, Daily, Weekly, Monthly
-
Allow creators to set time of day for scraping
-
Optionally retain or overwrite previous content (with version tagging)
2. Webhook Trigger Support (New):
After a successful or failed scrape, Pickaxe should offer an outbound webhook option:
-
Trigger a custom URL (e.g., n8n, Zapier, Make, or internal system)
-
Include payload: file ID, timestamp, scrape status, and diff summary if applicable
-
Supports automation like:
-
Logging to a Notion dashboard
-
Sending a Slack message
-
Re-running an agent or regenerating a report
-
Example Use Case:
Scrape site at 6am → webhook hits n8n → n8n notifies Slack + updates a Google Sheet + pings an AI workflow.
3. After-Scrape Diff Reporting (Optional but Strongly Recommended):
Generate and optionally send to Studio Owner/Controller:
-
How many chunks were added/changed/removed
-
Diff summary in plain text
-
Timestamp + link to updated Knowledge File
4. Scrape Health Monitoring:
-
Log scrape attempts with status (success, fail, skipped)
-
Retry logic (e.g., 3x backoff)
-
Email or in-app alert if a scrape fails repeatedly
Why This Matters:
This feature turns static data into living knowledge, essential for agents supporting real-time or frequently updated domains. Adding webhook support also unlocks serious integration power for automation-minded users—without forcing them into brittle scraping workarounds outside Pickaxe.
Final Thought:
Bring scraping back—but make it programmable, transparent, and reliable. Knowledge is only useful when it’s current. Let us keep it that way, on our own terms.