Replies: 1 comment
-
🤖 Hello @light42, I'm Dosu, a friendly bot here to assist you with questions, bugs, and contributions related to LlamaIndex while we wait for a human maintainer. I'm equipped to provide you with guidance in resolving your issue. Please feel free to reach out if you need any further assistance. Based on the information from the solved issues in the LlamaIndex repository, it seems that you can indeed use existing embeddings without having to re-embed the data. From the first issue, [Question]: Embedding before index?, it's suggested that you can construct the documents or nodes ahead of time and attach embeddings. Here's the example code provided: node = TextNode(text="...", embedding=[...])
document = Document(text="...", embedding=[...]) In the second issue, Using llama index with existing postgres db and pgvector extension, it's suggested that you can connect to your existing table and query the vector_store = SupabaseVectorStore( postgres_connection_string="postgresql://postgres:[email protected]:5432/postgres", collection_name='test_vector_col')
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
query_engine = index.as_query_engine()
response = query_engine.query("Who is the author?") However, if you want to use an existing SQL vector database and retrieve top-k rows based on the existing embedding column without re-embedding the data when you call from llama_index.vector_stores.pinecone import PineconeVectorStore
from llama_index.vector_stores.types import VectorStoreQuery, VectorStoreQueryMode
# Initialize PineconeVectorStore
store = PineconeVectorStore(
api_key="<your_api_key>",
index_name="<your_index_name>",
environment="<your_environment>",
namespace="<your_namespace>",
add_sparse_vector=False, # Set to False to avoid re-embedding
)
# Prepare your query
query = VectorStoreQuery(
query_embedding=<your_query_embedding>, # Your query embedding
similarity_top_k=<your_top_k>, # Number of top similar nodes you want to retrieve
mode=VectorStoreQueryMode.DEFAULT, # Use DEFAULT mode to query based on embeddings
)
# Query the store
result = store.query(query) In this example, Please replace As for the Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
-
According to the documentations, first you use loader to load the data, and then feed that data to vectorstoreindex.
When I feed that data to vectorstoreindex, it automatically creates the embedding for the rows in the database, which I find it weird since the table already have embedding array.
I don't want the re-do embedding process, I just want llama-index to simply retrieve top-k rows from the database based on existing embedding column whenever I called
query()
. How to do that?Beta Was this translation helpful? Give feedback.
All reactions