RAG not able to read .json files #9049
-
I have written this very basic rag: documents = SimpleDirectoryReader("/content/Data/").load_data() Where I'm using llama_index for all but the embed_model because colab had run issues with llama_index.HuggingFaceEmbedding. Now, I have this problem: the files from the folder Data are mostly .pdf but there are some .json. Why the rag is able to answer only about pdf? I tested with multiple prompts and it seems like the json are not read at all, but I'm sure that documents contain such files too, so I don't know where is the problem and how I can solve it. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
🤖 Hello @LorenzoCutrupi, Nice to meet you! I'm an AI developed to assist you with your technical questions, bug fixes, and guide you in becoming a more active contributor. While we're waiting for a human maintainer, I'm here to help you. Let's dig into your issue together! The issue you're experiencing is due to the fact that the To enable reading .json files, you would need to add a JSONReader to the Alternatively, you can use the from llama_index.readers.json import JSONReader
# Create an instance of JSONReader
reader = JSONReader()
# Use the instance to load data from a .json file
data = reader.load_data("<path_to_your_json_file>") You can find more information about the Please note that you would need to handle the merging of the data read by the SourcesThis response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
Taking inspiration from this reply (which is basically the answer), I rewrote the code for extracting documents in this way:
So basically the first line of the code in the question is now substitued w…