[Question]: Knowledge Graph Vs RAG #8393
Replies: 7 comments 1 reply
-
🤖 Hello, Thank you for your interest in LlamaIndex and for your detailed question. It seems like you're on the right track with your approach to using a Knowledge Graph (KG) for your use case. To address your concerns:
For your use case, you might need to preprocess your data to ensure that it is in a format that can be easily parsed by the model. This might involve converting your excel data into a structured format like JSON or CSV, and structuring your text in a way that makes it easier for the model to extract the relevant triplets. Here's a sample code snippet on how you can use LlamaIndex for your use case: from llama_index import (
KnowledgeGraphIndex,
LLMPredictor,
ServiceContext,
SimpleDirectoryReader,
)
from llama_index.storage.storage_context import StorageContext
from llama_index.graph_stores import NebulaGraphStore
from llama_index.llms import OpenAI
llm = OpenAI(temperature=0, model="text-davinci-002")
service_context = ServiceContext.from_defaults(llm=llm, chunk_size_limit=512)
graph_store = NebulaGraphStore(
space_name=space_name,
edge_types=edge_types,
rel_prop_names=rel_prop_names,
tags=tags,
)
storage_context = StorageContext.from_defaults(graph_store=graph_store)
kg_index = KnowledgeGraphIndex.from_documents(
documents,
storage_context=storage_context,
max_triplets_per_chunk=10,
service_context=service_context,
space_name=space_name,
edge_types=edge_types,
rel_prop_names=rel_prop_names,
tags=tags,
include_embeddings=True,
)
from llama_index.query_engine import KnowledgeGraphQueryEngine
query_engine = KnowledgeGraphQueryEngine(
storage_context=storage_context,
service_context=service_context,
llm=llm,
verbose=True,
)
response = query_engine.query(
"Your question here",
) Please replace the placeholders with your actual data and parameters. I hope this helps! If you have any more questions or need further clarification, feel free to ask. Best, Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
-
🤖 Hello, Thank you for your detailed question. It seems like you're on the right track with using LlamaIndex's Knowledge Graph (KG) for your use case. For your first two questions, as you mentioned, a simple vector store or even a traditional information retrieval system might suffice. However, for the last two questions, you're correct that a more complex system like a KG would be beneficial due to the need to retrieve multiple related chunks of information. In terms of the issues you're facing with the KG, it seems like there might be some confusion about how the graph is generated and how triplets are extracted from the text. The graph is generated based on the triplets that are extracted from the text. Each triplet consists of a subject, predicate, and object, which form the nodes and edges of the graph. The The extraction of triplets is handled by the There are a few limitations to this process that might be causing the issues you're seeing:
You can find more details about this process in the KnowledgeGraphIndex class. As for the image you've shared, I'm sorry but I'm unable to view images. Could you please describe the issue you're facing with the graph in more detail? I hope this helps! If you have any more questions or need further clarification, please don't hesitate to ask. Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
-
@dosu-bot So for my point 3 & 4 is this the best approach? I have tried to manually add triplets in graph node_0_tups = [ this is the same code from llama-index I am using. But the generated graph is having a relation like this: the two nodes are author and worked on and in arrow its showing relation as author too. What I have seen is that when inserting triplets the first and second elements is taken as node and the first elements is taken as relation for all cases. Is that the correct method in reality should'nt first and third element be taken as node and the second elements as relation? Also what is the use of inserting nodes[0] here-index.upsert_triplet_and_node(tup, nodes[0]) For creating KG:
|
Beta Was this translation helpful? Give feedback.
-
@vishnu9000 hmm, I think it's just a bug with drawing the graph tbh, especially since we have unit tests for upserting triplets like that. |
Beta Was this translation helpful? Give feedback.
-
@logan-markewich ohh..so is this a bug with llama index? Any way to rectify it? If this is the case then for other KG options in llama index same bug will be there right? |
Beta Was this translation helpful? Give feedback.
-
@vishnu9000 could you give it a try on NebulaGraph first? that's the main graph store I kept changing/iterating/optimizing towards. Will spend some time polishing the SimpleStore and the mutation of the kg_index later. |
Beta Was this translation helpful? Give feedback.
-
@wey-gu Thanks for the info I will check it out. This might be a dumb question but how to create graph from text for nebulagraph? Should it be done manually? or use an llm for extracting entity, wouldn't that take too much time? I have 100 excels with test case id, component, test case name, steps , results how to create a graph entity from this? |
Beta Was this translation helpful? Give feedback.
-
Question Validation
Question
Hi guys I am new to llama index and llm. I am working on an use case for document question answering and I am confused on which path to take. I am lot of excel files that contain test case details like component, test case id, test case name, test case steps, expected results. I want to create a QA system such that it should be able to answer following questions:
Here first 2 questions can be done with simple vector store. But last two cannot be achieved by vector store due to limitations of vector store RAG where chunks are save. Let say I said give me test case for y component. The pipeline will retrieve 5 best matched chunk. But I may have 20 such test cases. And for last question the model might need to go to multiple chunks to get the accurate results. In my case for each test case component, test case id, test case name, test case steps, expected results is a chunk so itself will be having a good amount of size.
That when I thought of Knowledge Graph and looked into llama index framework. I am using local llama 2 quantized model for security reasons of data. I have checked out following sections.
I have tried to add triplets manually as in docs
node_0_tups = [
("author", "worked on", "writing"),
("author", "worked on", "programming"),
]
for tup in node_0_tups:
index.upsert_triplet_and_node(tup, nodes[0])
But this the graph its generating, I am really confused by how the relation ship became like this.
I was hoping KG could help with with all the use case I have listed above. Can some one help me with this. I am stuck. Am I doing anything wrong? Is this not the correct approach?
Beta Was this translation helpful? Give feedback.
All reactions