How to add new data on top of the exisiting indexed data by graph RAG #360

CraftsMan-Labs · 2024-07-04T09:31:16Z

No description provided.

bmaltais · 2024-07-04T14:53:39Z

Do you mean by updating an already indexed document, or by adding new ones? Adding new ones work and will just index the added doc. I have not tried updating an existing document. The question could also be expanded as can you remove an indexed document be deleting it from the input folder.

CraftsMan-Labs · 2024-07-04T15:52:01Z

Yep adding new docs on an already indexed system. Yes we would need upsert and remove stuff. but for starters upsert would be great.

bmaltais · 2024-07-04T15:56:41Z

Adding new documents to the input folder will trigger indexing for those new documents. However, it will not index existing ones. Be aware that existing communities might get re-generated each time you add new documents, which can be time-consuming and consume valuable LLM credits.

It would be beneficial to have an option to create only new communities and skip reprocessing existing ones, allowing users to decide when to update existing community summaries. This approach would save significant LLM processing and cost, at the expense of a slight decrease in precision.

Personally, I prefer quickly indexing new documents, creating any necessary new communities, and then, at the end of the day, allowing the system to rebuild existing communities if needed based on the new documents added.

CraftsMan-Labs · 2024-07-04T17:00:59Z

We cant regenerate a new parquet file and communities when I just added a file when I have 1000s of files preprocessed.
Like in other vector DBs we need something like just add it dynamically to the main DB and build relationships automatically

bmaltais · 2024-07-04T17:04:54Z

We cant regenerate a new parquet file and communities when I just added a file when I have 1000s of files preprocessed. Like in other vector DBs we need something like just add it dynamically to the main DB and build relationships automatically

I totally agree. Even with 10 files it quickly become super combersome and lenghty everytime a new file is added to the mix.

The claim_extraction has an enabled: true section that is commented out. I assume the default is false... So the same would be nice for community_report... maybe?

Would prevent new community from being created... so perhaps not optimal... Maybe a new optional variable called only_generate_new_communities: true ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to add new data on top of the exisiting indexed data by graph RAG #360

How to add new data on top of the exisiting indexed data by graph RAG #360

CraftsMan-Labs commented Jul 4, 2024

bmaltais commented Jul 4, 2024

CraftsMan-Labs commented Jul 4, 2024

bmaltais commented Jul 4, 2024 •

edited

Loading

CraftsMan-Labs commented Jul 4, 2024

bmaltais commented Jul 4, 2024 •

edited

Loading

How to add new data on top of the exisiting indexed data by graph RAG #360

How to add new data on top of the exisiting indexed data by graph RAG #360

Comments

CraftsMan-Labs commented Jul 4, 2024

bmaltais commented Jul 4, 2024

CraftsMan-Labs commented Jul 4, 2024

bmaltais commented Jul 4, 2024 • edited Loading

CraftsMan-Labs commented Jul 4, 2024

bmaltais commented Jul 4, 2024 • edited Loading

bmaltais commented Jul 4, 2024 •

edited

Loading

bmaltais commented Jul 4, 2024 •

edited

Loading