Support html and pdf as input file type #375
Unanswered
chenyang-shanghai
asked this question in
Q&A
Replies: 2 comments 3 replies
-
Integration with Document Intelligence and all its supported types would be excellent too. |
Beta Was this translation helpful? Give feedback.
1 reply
-
Hi! Currently we only support txt and csv inputs, as you pointed out. However, what I have done to process multiple file types is to have a preprocessing step where, using libs like pypdf, I convert everything into the same format. Same for docx, xslx, among others. I really like the proposal of adding Document Intelligence support. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
In the config guide, we can only chose .csv or text for the "file_type";
https://microsoft.github.io/graphrag/posts/config/json_yaml/
Whereas html file (saved from internal web site) and pdf are very common now. So I'd ask whether the Microsoft graphrag supports these input files types or not; If it supports, any guide will be appreciated.
Beta Was this translation helpful? Give feedback.
All reactions