Replies: 4 comments 4 replies
-
🤖 Hello, The difference in speed between the The The The complexity of the model used for embeddings can also affect the speed. More complex models might generate better embeddings but take longer to process. To increase the speed of the LlamaIndex HuggingFaceEmbedding, you can try the following:
Here is an example of how you can modify the embedding = HuggingFaceEmbedding(
model_name='bert-base-uncased',
tokenizer_name='bert-base-uncased',
embed_batch_size=512, # Increase batch size
device='cuda' # Use GPU
) And here is how you can create and use an OptimumEmbedding: from llama_index.embeddings import OptimumEmbedding
OptimumEmbedding.create_and_save_optimum_model(
"BAAI/bge-small-en-v1.5", "./bge_onnx"
)
embed_model = OptimumEmbedding(folder_name="./bge_onnx") Please note that the actual values for You can find more information in the llama_index/embeddings/huggingface.py and docs/examples/embeddings/huggingface.ipynb files. I hope this helps! Let me know if you have any other questions. SourcesThis response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
-
Could you try the |
Beta Was this translation helpful? Give feedback.
-
If I understand correctly based on the relative python file the HuggingFaceEmbedding module by LlamaIndex communicates with the HuggingFace API over the internet. Right? If yes, then maybe the cause of the slow performance is due to the API calls... |
Beta Was this translation helpful? Give feedback.
-
Hello, I use the OptinumEmbedding model for embedding. However it seems that ONNX model is running on cpu instead of gpu. Is there anything to set ? |
Beta Was this translation helpful? Give feedback.
-
Question Validation
Question
Hello,
I have a question regarding the embeddings generation process in LlamaIndex. If I use
from llama_index.embeddings import HuggingFaceEmbedding
instead of LangChain integration (from llama_index import VectorStoreIndex, SimpleDirectoryReader, LangchainEmbedding
) the time needed for the embeddings generation is longer. I timed the process and I found that the LangChain integration needs 4 to 7 times shorter time to generate the embeddings using the same model (paraphrase-MiniLM-L6-v2). Do you know why this happens? How can I increase the speed of the LlamaIndex HuggingFaceEmbedding? I need it because I want to use as few libraries as possible. So I prefer to keep only LlamaIndex instead of both frameworks in my project's requirements.Beta Was this translation helpful? Give feedback.
All reactions