Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: Integration Issue with Finetuned Embedding Inference Endpoint and TextEmbeddingsInference #16987

Open
1 task done
kouskouss opened this issue Nov 18, 2024 · 3 comments
Labels
question Further information is requested

Comments

@kouskouss
Copy link

Question Validation

  • I have searched both the documentation and discord for an answer.

Question

We recently created an inference endpoint on huggingface, which we use to run a finetuned embedding model on their GPU service. While the endpoint itself functions as intended, the request format it provides is difficult to integrate with our existing codebase. For a more seamless integration, we attempted to utilize the TextEmbeddingsInference class from llama_index.embeddings.text_embeddings_inference, as it would allow for a more compatible fit with our current implementation. Specifically, we created the object using the following setup: 
embed_model = TextEmbeddingsInference( model_name=model, base_url=API_URL, timeout=60, auth_token=key)
However, when we call embed_model.get_text_embedding("Test"), we encounter a KeyError: 0 exception.
Does anybody meet the same issue or can provide guidance on this issue or suggest any corrections to our approach?
Any assistance in resolving this would be greatly appreciated, as using TextEmbeddingsInference directly would substantially streamline our integration process.

Thank you in advance.

@kouskouss kouskouss added the question Further information is requested label Nov 18, 2024
Copy link

dosubot bot commented Nov 18, 2024

I found a similar closed issue regarding a problem with the Hugging Face embedding model and LlamaIndex versions [1]. Although it doesn't directly address the KeyError: 0 exception you're encountering, it might be related to compatibility issues between different versions of the library.

For your specific issue, ensure that the request payload format matches the expected structure for the TextEmbeddingsInference class, which should be a JSON object like this:

{
    "inputs": ["text1", "text2", ...],
    "truncate": true_or_false
}

Make sure that the response from the Hugging Face inference endpoint is structured correctly and that all necessary keys are present to avoid KeyError exceptions. Additionally, verify that you are using the correct dependencies and library versions required for the TextEmbeddingsInference class to function properly with Hugging Face endpoints:

  • python: version >=3.8.1,<4.0
  • llama-index-core: version ^0.11.0
  • huggingface-hub: version >=0.19.0 with the inference extra
  • sentence-transformers: version >=2.6.1 [2][3][4].

If the issue persists, you might want to check the specific configuration settings or parameters required for the TextEmbeddingsInference class, such as model_name, token, timeout, and others, to ensure they are correctly set up [5][6].

To continue talking to Dosu, mention @dosu.

@logan-markewich
Copy link
Collaborator

@kouskouss I think you used the wrong class? This class is meant for https://github.com/huggingface/text-embeddings-inference

You probably meant to use HuggingFaceInferenceAPIEmbedding ?

embed_model = HuggingFaceInferenceAPIEmbedding(model_name=API_URL, token=key) ?

@kouskouss
Copy link
Author

@logan-markewich I tried this class too and then i tried to call:

async def get_embedding():
    text = "Test"
    embedding = await embed_model.aget_text_embedding(text)
    print(embedding)
# Run the async function
await get_embedding()

and it returned an array of nan values, it did not work either. So in my understanding text inference do not support embedding model that are deployed on inference endpoint in hugging face.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants