[Question] How to load from already existing sql vector database? #9142

light42 · 2023-11-25T13:36:21Z

light42
Nov 25, 2023

According to the documentations, first you use loader to load the data, and then feed that data to vectorstoreindex.

When I feed that data to vectorstoreindex, it automatically creates the embedding for the rows in the database, which I find it weird since the table already have embedding array.

I don't want the re-do embedding process, I just want llama-index to simply retrieve top-k rows from the database based on existing embedding column whenever I called query(). How to do that?

@light42 · 2023-11-25T13:43:57Z

dosubot[bot]
bot Nov 25, 2023

🤖

Hello @light42,

I'm Dosu, a friendly bot here to assist you with questions, bugs, and contributions related to LlamaIndex while we wait for a human maintainer. I'm equipped to provide you with guidance in resolving your issue. Please feel free to reach out if you need any further assistance.

Based on the information from the solved issues in the LlamaIndex repository, it seems that you can indeed use existing embeddings without having to re-embed the data.

From the first issue, [Question]: Embedding before index?, it's suggested that you can construct the documents or nodes ahead of time and attach embeddings. Here's the example code provided:

node = TextNode(text="...", embedding=[...])
document = Document(text="...", embedding=[...])

In the second issue, Using llama index with existing postgres db and pgvector extension, it's suggested that you can connect to your existing table and query the vector_store. Here's the example code provided:

vector_store = SupabaseVectorStore( postgres_connection_string="postgresql://postgres:[email protected]:5432/postgres", collection_name='test_vector_col')
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
query_engine = index.as_query_engine()
response = query_engine.query("Who is the author?")

However, if you want to use an existing SQL vector database and retrieve top-k rows based on the existing embedding column without re-embedding the data when you call query(), you can use the PineconeVectorStore and its query() method. Here's an example:

from llama_index.vector_stores.pinecone import PineconeVectorStore
from llama_index.vector_stores.types import VectorStoreQuery, VectorStoreQueryMode

# Initialize PineconeVectorStore
store = PineconeVectorStore(
    api_key="<your_api_key>",
    index_name="<your_index_name>",
    environment="<your_environment>",
    namespace="<your_namespace>",
    add_sparse_vector=False,  # Set to False to avoid re-embedding
)

# Prepare your query
query = VectorStoreQuery(
    query_embedding=<your_query_embedding>,  # Your query embedding
    similarity_top_k=<your_top_k>,  # Number of top similar nodes you want to retrieve
    mode=VectorStoreQueryMode.DEFAULT,  # Use DEFAULT mode to query based on embeddings
)

# Query the store
result = store.query(query)

In this example, <your_query_embedding> should be a list of floats representing the embedding of your query, and <your_top_k> should be an integer representing the number of top similar nodes you want to retrieve. The query() method of PineconeVectorStore will use the existing embeddings in the Pinecone index to find the top-k most similar nodes to your query embedding. The VectorStoreQueryMode.DEFAULT mode is used to indicate that the query should be based on the embeddings, not on the sparse vectors or a hybrid of both.

Please replace <your_api_key>, <your_index_name>, <your_environment>, <your_namespace>, <your_query_embedding>, and <your_top_k> with your actual values.

As for the VectorStoreIndex.from_documents() function, I wasn't able to find specific information on whether it always creates new embeddings or can use existing ones. It might be helpful to look into the source code or documentation for this function for more details.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] How to load from already existing sql vector database? #9142

{{title}}

Replies: 1 comment

{{title}}

Select a reply

[Question] How to load from already existing sql vector database? #9142

light42 Nov 25, 2023

Replies: 1 comment

dosubot[bot] bot Nov 25, 2023

Sources

light42
Nov 25, 2023

dosubot[bot]
bot Nov 25, 2023