Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openai ratelimit error and ollama too slow on cpu #293

Open
Arslan-Mehmood1 opened this issue Nov 17, 2024 · 3 comments
Open

openai ratelimit error and ollama too slow on cpu #293

Arslan-Mehmood1 opened this issue Nov 17, 2024 · 3 comments

Comments

@Arslan-Mehmood1
Copy link

Arslan-Mehmood1 commented Nov 17, 2024

code:

import os
import time
from tqdm.auto import tqdm

os.environ['OPENAI_API_KEY']='abc'

import pdfplumber
from lightrag import LightRAG, QueryParam
from lightrag.llm import ollama_model_complete, ollama_embedding
from lightrag.utils import EmbeddingFunc

from lightrag.llm import gpt_4o_mini_complete

WORKING_DIR = "../run_2"

if not os.path.exists(WORKING_DIR):
    os.mkdir(WORKING_DIR)

# ========================================  OpenAI  ======================================== #
rag = LightRAG(
    working_dir=WORKING_DIR,
    llm_model_func=gpt_4o_mini_complete,
    # llm_model_func=gpt_4o_complete
)
# ========================================  Ollama  ======================================== #
# rag = LightRAG(
#    working_dir=WORKING_DIR,
#    chunk_token_size=1200,
#    llm_model_func=ollama_model_complete,
#    llm_model_name="llama3.2:1b", #, 
#    llm_model_max_async=1,
#    llm_model_max_token_size=32768,
#    llm_model_kwargs={"host": "http://localhost:11434", "options": {"num_ctx": 32768}},
#    embedding_func=EmbeddingFunc(
#        embedding_dim=768,
#        max_token_size=8192,
#        func=lambda texts: ollama_embedding(texts, embed_model="nomic-embed-text", host="http://localhost:11434"),
#    ),
# )


pdf_path = "../CompaniesAct2013.pdf" #Constitution_of_India.pdf
pdf_text = ""
with pdfplumber.open(pdf_path) as pdf:
    for page in tqdm(pdf.pages[14:20], desc='Extract Text from pages', total=2):
        pdf_text += page.extract_text() + "\n"


rag.insert(pdf_text)


INFO:lightrag:Logger initialized for working directory: ../run_2
INFO:lightrag:Load KV llm_response_cache with 0 data
INFO:lightrag:Load KV full_docs with 0 data
INFO:lightrag:Load KV text_chunks with 0 data
INFO:nano-vectordb:Init {'embedding_dim': 1536, 'metric': 'cosine', 'storage_file': '../run_2/vdb_entities.json'} 0 data
INFO:nano-vectordb:Init {'embedding_dim': 1536, 'metric': 'cosine', 'storage_file': '../run_2/vdb_relationships.json'} 0 data
INFO:nano-vectordb:Init {'embedding_dim': 1536, 'metric': 'cosine', 'storage_file': '../run_2/vdb_chunks.json'} 0 data
Extract Text from pages: 6it [00:02, 2.67it/s]
INFO:lightrag:[New Docs] inserting 1 docs
INFO:lightrag:[New Chunks] inserting 5 chunks
INFO:lightrag:Inserting 5 vectors to chunks
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 429 Too Many Requests"
INFO:openai._base_client:Retrying request to /embeddings in 0.449477 seconds
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 429 Too Many Requests"
INFO:openai._base_client:Retrying request to /embeddings in 0.878241 seconds
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 429 Too Many Requests"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 429 Too Many Requests"
INFO:openai._base_client:Retrying request to /embeddings in 0.389104 seconds
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 429 Too Many Requests"
INFO:openai._base_client:Retrying request to /embeddings in 0.882345 seconds
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 429 Too Many Requests"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 429 Too Many Requests"
INFO:openai._base_client:Retrying request to /embeddings in 0.469158 seconds
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 429 Too Many Requests"
INFO:openai._base_client:Retrying request to /embeddings in 0.888185 seconds
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 429 Too Many Requests"
INFO:lightrag:Writing graph with 0 nodes, 0 edges
Traceback (most recent call last):
File "/home/arslan/.virtualenvs/lightrag/lib/python3.10/site-packages/tenacity/asyncio/init.py", line 114, in call
result = await fn(*args, **kwargs)
File "/home/arslan/Data/Learning/side_projects/LighRAG_legal_doc/LightRAG/lightrag/llm.py", line 550, in openai_embedding
response = await openai_async_client.embeddings.create(
File "/home/arslan/.virtualenvs/lightrag/lib/python3.10/site-packages/openai/resources/embeddings.py", line 236, in create
return await self._post(
File "/home/arslan/.virtualenvs/lightrag/lib/python3.10/site-packages/openai/_base_client.py", line 1839, in post
return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
File "/home/arslan/.virtualenvs/lightrag/lib/python3.10/site-packages/openai/_base_client.py", line 1533, in request
return await self._request(
File "/home/arslan/.virtualenvs/lightrag/lib/python3.10/site-packages/openai/_base_client.py", line 1619, in _request
return await self._retry_request(
File "/home/arslan/.virtualenvs/lightrag/lib/python3.10/site-packages/openai/_base_client.py", line 1666, in _retry_request
return await self._request(
File "/home/arslan/.virtualenvs/lightrag/lib/python3.10/site-packages/openai/_base_client.py", line 1619, in _request
return await self._retry_request(
File "/home/arslan/.virtualenvs/lightrag/lib/python3.10/site-packages/openai/_base_client.py", line 1666, in _retry_request
return await self._request(
File "/home/arslan/.virtualenvs/lightrag/lib/python3.10/site-packages/openai/_base_client.py", line 1634, in _request
raise self._make_status_error_from_response(err.response) from None
openai.RateLimitError: Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}}

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/arslan/Data/Learning/side_projects/LighRAG_legal_doc/LightRAG/1_run_ollama.py", line 63, in
rag.insert(pdf_text)
File "/home/arslan/Data/Learning/side_projects/LighRAG_legal_doc/LightRAG/lightrag/lightrag.py", line 227, in insert
return loop.run_until_complete(self.ainsert(string_or_strings))
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/home/arslan/Data/Learning/side_projects/LighRAG_legal_doc/LightRAG/lightrag/lightrag.py", line 271, in ainsert
await self.chunks_vdb.upsert(inserting_chunks)
File "/home/arslan/Data/Learning/side_projects/LighRAG_legal_doc/LightRAG/lightrag/storage.py", line 98, in upsert
embeddings_list = await asyncio.gather(
File "/home/arslan/Data/Learning/side_projects/LighRAG_legal_doc/LightRAG/lightrag/utils.py", line 89, in wait_func
result = await func(*args, **kwargs)
File "/home/arslan/Data/Learning/side_projects/LighRAG_legal_doc/LightRAG/lightrag/utils.py", line 45, in call
return await self.func(*args, **kwargs)
File "/home/arslan/.virtualenvs/lightrag/lib/python3.10/site-packages/tenacity/asyncio/init.py", line 189, in async_wrapped
return await copy(fn, *args, **kwargs)
File "/home/arslan/.virtualenvs/lightrag/lib/python3.10/site-packages/tenacity/asyncio/init.py", line 111, in call
do = await self.iter(retry_state=retry_state)
File "/home/arslan/.virtualenvs/lightrag/lib/python3.10/site-packages/tenacity/asyncio/init.py", line 153, in iter
result = await action(retry_state)
File "/home/arslan/.virtualenvs/lightrag/lib/python3.10/site-packages/tenacity/_utils.py", line 99, in inner
return call(*args, **kwargs)
File "/home/arslan/.virtualenvs/lightrag/lib/python3.10/site-packages/tenacity/init.py", line 419, in exc_check
raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x73ece6a8e8c0 state=finished raised RateLimitError>]

@Arslan-Mehmood1 Arslan-Mehmood1 changed the title openai ratelimit error openai ratelimit error and ollama too slow on cpu Nov 17, 2024
@LarFii
Copy link
Collaborator

LarFii commented Nov 19, 2024

This is indeed a challenging issue to avoid. However, you can add a delay and retry the insertion process after encountering a rate limit error with OpenAI. Since already extracted content is cached, it won't be reprocessed. Unfortunately, for local models, we currently don't have an effective way to significantly improve the speed on CPU.

@m-aliabbas
Copy link

Facing rate limit (per minute) issue
I added time.sleep in openai/_base_client.py", line 1611, in _request

    def _make_status_error_from_response(
        self,
        response: httpx.Response,
    ) -> APIStatusError:
        if response.is_closed and not response.is_stream_consumed:
            # We can't read the response body as it has been closed
            # before it was read. This can happen if an event hook
            # raises a status error.
            body = None
            err_msg = f"Error code: {response.status_code}"
        else:
            err_text = response.text.strip()
            body = err_text

            try:
                body = json.loads(err_text)
                err_msg = f"Error code: {response.status_code} - {body}"
            except Exception:
                err_msg = err_text or f"Error code: {response.status_code}"
        time.sleep(1.0)
        return self._make_status_error(err_msg, body=body, response=response)
        
and its working

@Arslan-Mehmood1
Copy link
Author

where is this file ? openai/_base_client.py", line 1611, in _request

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants