@ab2308 @nathaniel I found another way to achieve this. When I posted this thread, the requirement was a continuously synced DB. I use Vectorize.io as a service. Vectorize lets you connect Google Drive, Sharepoint, Onedrive and a host of other Source connectors with a Vector DB of choice (Pinecone, AstraDB, etc.) and then provide a retrieval API for fetching the response. The good thing is you can schedule the pipeline sync (which ingests the documents at a specific time from the connected sources). I was able to set it up as a pickaxe action. Here is the code:
Ensure that the Authorization Header DOESNâT contain âBearerâ else youâll get a 401 error.
Pros: 1. Extremely simple to set-up specifically if you want realtime data sync. 2. They have inbuilt RAG eval on the platform.
Cons: 1. Adds to the cost (although I find vectorizeâs pricing to be lower than others) and the free tier is good enough for small businesses.
2. Currently, their RAG system is rather naive with fixed chunking. One of our use cases is a multimodal RAG (for example, say a geometry problem in a textbook which requires both an image and text to be embedded together and then retrieved). We are still figuring out how to do it with pickaxe.
For those who ABSOLUTELY need continuous sync or need to use a specific VectorDB like Pinecone, this ticks most of the boxes!

