I am trying to scan either an online PDF or a website link. What is the best approach to scan and pull data from a webpage and put it into the final output? What are the best AI models to do this?
Hi @rainmaker in a form pickaxe, both the upload document and normal short text user input fields can accept URLs.
Instead of promoting a PDF url, utilize the document upload field and guide users (with placeholder) text what to upload (tip: encourage PDF doc uploads).
@Ned.Malki Thanks, but an issue I am running into with just the website URL that needs to be scraped is that Pickaxe AI can’t read the URL from the form (this is the error I receive). Do you have any ideas on how to crawl it and pull the requested data from the URL?
@rainmaker I hear you. Native scraper can run into issues while attempting to transport text due to cloudeflare blockers set up by site admins (or host of other reasons that prevent successful scraping).
An alternative method would be to set up an automation utilizing a brute force scraper like Apify or Firecrawl > return clean data to your pickaxe.