[Question]: When customize the LLM, using VectorStoreIndex, it reports error #8368

Danny-C-Auditore · 2023-07-20T08:11:33Z

Danny-C-Auditore
Jul 20, 2023

Question Validation

I have searched both the documentation and discord for an answer.

Question

Traceback (most recent call last):
File "/home/yaoyunzhang897/.local/lib/python3.11/site-packages/tenacity/init.py", line 382, in call
result = fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/yaoyunzhang897/.local/lib/python3.11/site-packages/llama_index/embeddings/openai.py", line 165, in get_embeddings
data = openai.Embedding.create(input=list_of_text, model=engine, **kwargs).data
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yaoyunzhang897/.local/lib/python3.11/site-packages/openai/api_resources/embedding.py", line 33, in create
response = super().create(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yaoyunzhang897/.local/lib/python3.11/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 149, in create
) = cls.__prepare_create_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yaoyunzhang897/.local/lib/python3.11/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 106, in __prepare_create_request
requestor = api_requestor.APIRequestor(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yaoyunzhang897/.local/lib/python3.11/site-packages/openai/api_requestor.py", line 138, in init
self.api_key = key or util.default_api_key()
^^^^^^^^^^^^^^^^^^^^^^
File "/home/yaoyunzhang897/.local/lib/python3.11/site-packages/openai/util.py", line 186, in default_api_key
raise openai.error.AuthenticationError(
openai.error.AuthenticationError: No API key provided. You can set your API key in code using 'openai.api_key = ', or you can set the environment variable OPENAI_API_KEY=). If your API key is stored in a file, you can point the openai module at it with 'openai.api_key_path = '. You can generate API keys in the OpenAI web interface. See https://platform.openai.com/account/api-keys for details.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/yaoyunzhang897/llama-index/test.py", line 82, in
index = VectorStoreIndex.from_documents(documents, service_context=service_context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yaoyunzhang897/.local/lib/python3.11/site-packages/llama_index/indices/base.py", line 102, in from_documents
return cls(
^^^^
File "/home/yaoyunzhang897/.local/lib/python3.11/site-packages/llama_index/indices/vector_store/base.py", line 46, in init
super().init(
File "/home/yaoyunzhang897/.local/lib/python3.11/site-packages/llama_index/indices/base.py", line 71, in init
index_struct = self.build_index_from_nodes(nodes)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yaoyunzhang897/.local/lib/python3.11/site-packages/llama_index/indices/vector_store/base.py", line 241, in build_index_from_nodes
return self._build_index_from_nodes(nodes)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yaoyunzhang897/.local/lib/python3.11/site-packages/llama_index/indices/vector_store/base.py", line 229, in _build_index_from_nodes
self._add_nodes_to_index(
File "/home/yaoyunzhang897/.local/lib/python3.11/site-packages/llama_index/indices/vector_store/base.py", line 201, in _add_nodes_to_index
embedding_results = self._get_node_embedding_results(nodes, show_progress)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yaoyunzhang897/.local/lib/python3.11/site-packages/llama_index/indices/vector_store/base.py", line 111, in _get_node_embedding_results
) = self._service_context.embed_model.get_queued_text_embeddings(show_progress)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home//.local/lib/python3.11/site-packages/llama_index/embeddings/base.py", line 180, in get_queued_text_embeddings
embeddings = self._get_text_embeddings(cur_batch_texts)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yaoyunzhang897/.local/lib/python3.11/site-packages/llama_index/embeddings/openai.py", line 285, in _get_text_embeddings
return get_embeddings(
^^^^^^^^^^^^^^^
File "/home/yaoyunzhang897/.local/lib/python3.11/site-packages/tenacity/init.py", line 289, in wrapped_f
return self(f, *args, **kw)
^^^^^^^^^^^^^^^^^^^^
File "/home/yaoyunzhang897/.local/lib/python3.11/site-packages/tenacity/init.py", line 379, in call
do = self.iter(retry_state=retry_state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yaoyunzhang897/.local/lib/python3.11/site-packages/tenacity/init.py", line 326, in iter
raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x7fd5edf13d10 state=finished raised AuthenticationError>]

it reports "openai.error.AuthenticationError: No API key provided."
But the question is that I didn't try to use the Openai model, why would I need the API key?

aetherwu · 2023-07-20T09:33:08Z

aetherwu
Jul 20, 2023

index = VectorStoreIndex.from_documents(documents, service_context=service_context)

Would call OpenAI embedding by default.
How did you config your customized llm connection?

0 replies

Danny-C-Auditore · 2023-07-20T12:50:54Z

Danny-C-Auditore
Jul 20, 2023
Author

I did the same as the example code in the doc. I don't know what went wrong.

0 replies

Danny-C-Auditore · 2023-07-20T12:51:35Z

Danny-C-Auditore
Jul 20, 2023
Author

the latter is "Using a HuggingFace LLM"

0 replies

aetherwu · 2023-07-20T12:58:57Z

aetherwu
Jul 20, 2023

Hi, I assume that it still uses OpenAI embadding to decode/encode your documents and query text to vecter.
The HuggingFace LLM you use here, looks like it asks final question and materials to LLM.
You might want to wait and see auther's feedback from repo.

0 replies

Danny-C-Auditore · 2023-07-20T13:06:10Z

Danny-C-Auditore
Jul 20, 2023
Author

yes, the hf model just runs once for only one question and then the program stops. I'm trying to use the chat_engine for consistent conversation. And thanks for your help, I appreciate that.

0 replies

logan-markewich · 2023-07-20T15:24:44Z

logan-markewich
Jul 20, 2023
Maintainer

@Danny-C-Auditore yea there's two models in llama-index -- the LLM and the embedding model

You'll need to set both in the service context in order to to avoid openai.

You can learn more about setting up custom embeddings here:
https://gpt-index.readthedocs.io/en/latest/core_modules/model_modules/embeddings/usage_pattern.html#embedding-model-integrations

0 replies

Danny-C-Auditore · 2023-07-20T17:26:45Z

Danny-C-Auditore
Jul 20, 2023
Author

Another error emerges when running "service_context = ServiceContext.from_defaults(
llm=llm, context_window=int(window), num_output=int(output),embed_model=("local:"+ model_name)
)"

The probelm is, the service_context seems not to have an attibute tokenizer, so I can't add the [pad] token, how can I solve this?

Since the model was running on a server and I cannot upload the tokenizer_config.json(the connection to the server will crash.....), using "(tokenizer.pad_token = tokenizer.eos_token e.g.)or add a new pad token viatokenizer.add_special_tokens({'pad_token': '[PAD]'})" is the only way for me.

Thanks a lot !!!

0 replies

logan-markewich · 2023-07-20T19:13:28Z

logan-markewich
Jul 20, 2023
Maintainer

Hmm, sounds like to me you just need to implement your own custom embedding model class.

Normally it's expected that the tokenization is handled by the model class

There's an example here of a completely custom embedding class that sub classes the base class
https://gpt-index.readthedocs.io/en/latest/core_modules/model_modules/embeddings/usage_pattern.html#custom-embedding-model

0 replies

@Danny-C-Auditore · 2023-10-19T16:03:05Z

dosubot[bot]
bot Oct 19, 2023

Hi, @Danny-C-Auditore! I'm Dosu, and I'm helping the LlamaIndex team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, you encountered an error stating "openai.error.AuthenticationError: No API key provided" when customizing the LLM using VectorStoreIndex. You mentioned that you didn't try to use the OpenAI model and were confused about why an API key was required. A user named aetherwu suggested that the OpenAI embedding is called by default and asked how the llm connection was configured. You shared screenshots of your code and mentioned that you were trying to use the chat_engine for consistent conversation. Another user named logan-markewich explained that both the LLM and the embedding model need to be set in the service context to avoid using the OpenAI model. They provided a link for setting up custom embeddings.

It seems that logan-markewich's suggestion resolved the issue for you. They recommended setting both the LLM and the embedding model in the service context to avoid using the OpenAI model. They also provided a link for setting up custom embeddings.

Before we close this issue, could you please confirm if it is still relevant to the latest version of the LlamaIndex repository? If it is, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and contribution to the LlamaIndex repository!

0 replies

@dosu-bot · 2023-10-24T06:44:56Z

dosubot[bot]
bot Oct 24, 2023

🤖

Hello,

The error you're encountering is due to the way the LlamaIndex framework is implemented. Even if you're not directly using the OpenAI model, the LlamaIndex framework might be using OpenAI's API in the background for some operations. Therefore, it requires the OpenAI API key to be set.

The OpenAIEmbedding class in the LlamaIndex framework is responsible for generating embeddings for text data using OpenAI's models. This class calls the resolve_openai_credentials function in its constructor, which requires the OpenAI API key to be set. Therefore, even if you're not directly using the OpenAI model, the LlamaIndex framework requires the OpenAI API key to be set for the OpenAIEmbedding class.

There are a few similar issues that have been solved in the LlamaIndex repository:

AuthenticationError when using OpenAI Embedding API in llama_index: The suggested solution is to set the environment variable before importing the VectorStoreIndex. So, put os.environ["OPENAI_API_KEY"] = 'xxx' above from llama_index import SimpleDirectoryReader, VectorStoreIndex, StorageContext and it should work.
Since 0.8.2 using a different LLM of OpenAI requests the ApiKey: This issue was closed by a pull request: fix from_service_context resolving the embed model on August 16, 2023
Asks for Open AI API key when using local LLM model and embeddings model: The suggested solution is to pass in the service context when loading the index from storage. The code should look like this: index = load_index_from_storage(storage_context, service_context=service_context)

I hope this helps! If you have any other questions, feel free to ask.

Best,
Dosu

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

0 replies

@dosu-bot · 2023-10-24T06:47:20Z

dosubot[bot]
bot Oct 24, 2023

🤖

Hello,

The error you're encountering is due to the way the LlamaIndex framework is implemented. Even if you're not directly using the OpenAI model, the LlamaIndex framework might be using OpenAI's API in the background for some operations. Therefore, it requires the OpenAI API key to be set.

The OpenAIEmbedding class in the LlamaIndex framework is responsible for generating embeddings for text data using OpenAI's models. This class calls the resolve_openai_credentials function in its constructor, which requires the OpenAI API key to be set. Therefore, even if you're not directly using the OpenAI model, the LlamaIndex framework requires the OpenAI API key to be set for the OpenAIEmbedding class.

There are a few similar issues that have been solved in the LlamaIndex repository:

AuthenticationError when using OpenAI Embedding API in llama_index: The suggested solution is to set the environment variable before importing the VectorStoreIndex. So, put os.environ["OPENAI_API_KEY"] = 'xxx' above from llama_index import SimpleDirectoryReader, VectorStoreIndex, StorageContext and it should work.
Since 0.8.2 using a different LLM of OpenAI requests the ApiKey: This issue was closed by a pull request: fix from_service_context resolving the embed model on August 16, 2023
Asks for Open AI API key when using local LLM model and embeddings model: The suggested solution is to pass in the service context when loading the index from storage. The code should look like this: index = load_index_from_storage(storage_context, service_context=service_context)

I hope this helps! If you have any other questions, feel free to ask.

Best,
Dosu

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: When customize the LLM, using VectorStoreIndex, it reports error #8368

{{title}}

Replies: 11 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

[Question]: When customize the LLM, using VectorStoreIndex, it reports error #8368

Danny-C-Auditore Jul 20, 2023

Question Validation

Question

Replies: 11 comments

aetherwu Jul 20, 2023

Danny-C-Auditore Jul 20, 2023 Author

Danny-C-Auditore Jul 20, 2023 Author

aetherwu Jul 20, 2023

Danny-C-Auditore Jul 20, 2023 Author

logan-markewich Jul 20, 2023 Maintainer

Danny-C-Auditore Jul 20, 2023 Author

logan-markewich Jul 20, 2023 Maintainer

dosubot[bot] bot Oct 19, 2023

dosubot[bot] bot Oct 24, 2023

Sources

dosubot[bot] bot Oct 24, 2023

Sources

Danny-C-Auditore
Jul 20, 2023

aetherwu
Jul 20, 2023

Danny-C-Auditore
Jul 20, 2023
Author

Danny-C-Auditore
Jul 20, 2023
Author

aetherwu
Jul 20, 2023

Danny-C-Auditore
Jul 20, 2023
Author

logan-markewich
Jul 20, 2023
Maintainer

Danny-C-Auditore
Jul 20, 2023
Author

logan-markewich
Jul 20, 2023
Maintainer

dosubot[bot]
bot Oct 19, 2023

dosubot[bot]
bot Oct 24, 2023

dosubot[bot]
bot Oct 24, 2023