openai ratelimit error and ollama too slow on cpu #293

Arslan-Mehmood1 · 2024-11-17T19:37:01Z

code:

import os
import time
from tqdm.auto import tqdm

os.environ['OPENAI_API_KEY']='abc'

import pdfplumber
from lightrag import LightRAG, QueryParam
from lightrag.llm import ollama_model_complete, ollama_embedding
from lightrag.utils import EmbeddingFunc

from lightrag.llm import gpt_4o_mini_complete

WORKING_DIR = "../run_2"

if not os.path.exists(WORKING_DIR):
    os.mkdir(WORKING_DIR)

# ========================================  OpenAI  ======================================== #
rag = LightRAG(
    working_dir=WORKING_DIR,
    llm_model_func=gpt_4o_mini_complete,
    # llm_model_func=gpt_4o_complete
)
# ========================================  Ollama  ======================================== #
# rag = LightRAG(
#    working_dir=WORKING_DIR,
#    chunk_token_size=1200,
#    llm_model_func=ollama_model_complete,
#    llm_model_name="llama3.2:1b", #, 
#    llm_model_max_async=1,
#    llm_model_max_token_size=32768,
#    llm_model_kwargs={"host": "http://localhost:11434", "options": {"num_ctx": 32768}},
#    embedding_func=EmbeddingFunc(
#        embedding_dim=768,
#        max_token_size=8192,
#        func=lambda texts: ollama_embedding(texts, embed_model="nomic-embed-text", host="http://localhost:11434"),
#    ),
# )


pdf_path = "../CompaniesAct2013.pdf" #Constitution_of_India.pdf
pdf_text = ""
with pdfplumber.open(pdf_path) as pdf:
    for page in tqdm(pdf.pages[14:20], desc='Extract Text from pages', total=2):
        pdf_text += page.extract_text() + "\n"


rag.insert(pdf_text)

INFO:lightrag:Logger initialized for working directory: ../run_2
INFO:lightrag:Load KV llm_response_cache with 0 data
INFO:lightrag:Load KV full_docs with 0 data
INFO:lightrag:Load KV text_chunks with 0 data
INFO:nano-vectordb:Init {'embedding_dim': 1536, 'metric': 'cosine', 'storage_file': '../run_2/vdb_entities.json'} 0 data
INFO:nano-vectordb:Init {'embedding_dim': 1536, 'metric': 'cosine', 'storage_file': '../run_2/vdb_relationships.json'} 0 data
INFO:nano-vectordb:Init {'embedding_dim': 1536, 'metric': 'cosine', 'storage_file': '../run_2/vdb_chunks.json'} 0 data
Extract Text from pages: 6it [00:02, 2.67it/s]
INFO:lightrag:[New Docs] inserting 1 docs
INFO:lightrag:[New Chunks] inserting 5 chunks
INFO:lightrag:Inserting 5 vectors to chunks
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 429 Too Many Requests"
INFO:openai._base_client:Retrying request to /embeddings in 0.449477 seconds
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 429 Too Many Requests"
INFO:openai._base_client:Retrying request to /embeddings in 0.878241 seconds
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 429 Too Many Requests"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 429 Too Many Requests"
INFO:openai._base_client:Retrying request to /embeddings in 0.389104 seconds
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 429 Too Many Requests"
INFO:openai._base_client:Retrying request to /embeddings in 0.882345 seconds
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 429 Too Many Requests"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 429 Too Many Requests"
INFO:openai._base_client:Retrying request to /embeddings in 0.469158 seconds
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 429 Too Many Requests"
INFO:openai._base_client:Retrying request to /embeddings in 0.888185 seconds
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 429 Too Many Requests"
INFO:lightrag:Writing graph with 0 nodes, 0 edges
Traceback (most recent call last):
File "/home/arslan/.virtualenvs/lightrag/lib/python3.10/site-packages/tenacity/asyncio/init.py", line 114, in call
result = await fn(*args, **kwargs)
File "/home/arslan/Data/Learning/side_projects/LighRAG_legal_doc/LightRAG/lightrag/llm.py", line 550, in openai_embedding
response = await openai_async_client.embeddings.create(
File "/home/arslan/.virtualenvs/lightrag/lib/python3.10/site-packages/openai/resources/embeddings.py", line 236, in create
return await self._post(
File "/home/arslan/.virtualenvs/lightrag/lib/python3.10/site-packages/openai/_base_client.py", line 1839, in post
return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
File "/home/arslan/.virtualenvs/lightrag/lib/python3.10/site-packages/openai/_base_client.py", line 1533, in request
return await self._request(
File "/home/arslan/.virtualenvs/lightrag/lib/python3.10/site-packages/openai/_base_client.py", line 1619, in _request
return await self._retry_request(
File "/home/arslan/.virtualenvs/lightrag/lib/python3.10/site-packages/openai/_base_client.py", line 1666, in _retry_request
return await self._request(
File "/home/arslan/.virtualenvs/lightrag/lib/python3.10/site-packages/openai/_base_client.py", line 1619, in _request
return await self._retry_request(
File "/home/arslan/.virtualenvs/lightrag/lib/python3.10/site-packages/openai/_base_client.py", line 1666, in _retry_request
return await self._request(
File "/home/arslan/.virtualenvs/lightrag/lib/python3.10/site-packages/openai/_base_client.py", line 1634, in _request
raise self._make_status_error_from_response(err.response) from None
openai.RateLimitError: Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}}

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/arslan/Data/Learning/side_projects/LighRAG_legal_doc/LightRAG/1_run_ollama.py", line 63, in
rag.insert(pdf_text)
File "/home/arslan/Data/Learning/side_projects/LighRAG_legal_doc/LightRAG/lightrag/lightrag.py", line 227, in insert
return loop.run_until_complete(self.ainsert(string_or_strings))
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/home/arslan/Data/Learning/side_projects/LighRAG_legal_doc/LightRAG/lightrag/lightrag.py", line 271, in ainsert
await self.chunks_vdb.upsert(inserting_chunks)
File "/home/arslan/Data/Learning/side_projects/LighRAG_legal_doc/LightRAG/lightrag/storage.py", line 98, in upsert
embeddings_list = await asyncio.gather(
File "/home/arslan/Data/Learning/side_projects/LighRAG_legal_doc/LightRAG/lightrag/utils.py", line 89, in wait_func
result = await func(*args, **kwargs)
File "/home/arslan/Data/Learning/side_projects/LighRAG_legal_doc/LightRAG/lightrag/utils.py", line 45, in call
return await self.func(*args, **kwargs)
File "/home/arslan/.virtualenvs/lightrag/lib/python3.10/site-packages/tenacity/asyncio/init.py", line 189, in async_wrapped
return await copy(fn, *args, **kwargs)
File "/home/arslan/.virtualenvs/lightrag/lib/python3.10/site-packages/tenacity/asyncio/init.py", line 111, in call
do = await self.iter(retry_state=retry_state)
File "/home/arslan/.virtualenvs/lightrag/lib/python3.10/site-packages/tenacity/asyncio/init.py", line 153, in iter
result = await action(retry_state)
File "/home/arslan/.virtualenvs/lightrag/lib/python3.10/site-packages/tenacity/_utils.py", line 99, in inner
return call(*args, **kwargs)
File "/home/arslan/.virtualenvs/lightrag/lib/python3.10/site-packages/tenacity/init.py", line 419, in exc_check
raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x73ece6a8e8c0 state=finished raised RateLimitError>]

The text was updated successfully, but these errors were encountered:

LarFii · 2024-11-19T07:44:50Z

This is indeed a challenging issue to avoid. However, you can add a delay and retry the insertion process after encountering a rate limit error with OpenAI. Since already extracted content is cached, it won't be reprocessed. Unfortunately, for local models, we currently don't have an effective way to significantly improve the speed on CPU.

m-aliabbas · 2024-11-23T04:07:00Z

Facing rate limit (per minute) issue
I added time.sleep in openai/_base_client.py", line 1611, in _request

    def _make_status_error_from_response(
        self,
        response: httpx.Response,
    ) -> APIStatusError:
        if response.is_closed and not response.is_stream_consumed:
            # We can't read the response body as it has been closed
            # before it was read. This can happen if an event hook
            # raises a status error.
            body = None
            err_msg = f"Error code: {response.status_code}"
        else:
            err_text = response.text.strip()
            body = err_text

            try:
                body = json.loads(err_text)
                err_msg = f"Error code: {response.status_code} - {body}"
            except Exception:
                err_msg = err_text or f"Error code: {response.status_code}"
        time.sleep(1.0)
        return self._make_status_error(err_msg, body=body, response=response)

and its working

Arslan-Mehmood1 · 2024-11-23T13:21:31Z

where is this file ? openai/_base_client.py", line 1611, in _request

Arslan-Mehmood1 changed the title ~~openai ratelimit error~~ openai ratelimit error and ollama too slow on cpu Nov 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

openai ratelimit error and ollama too slow on cpu #293

openai ratelimit error and ollama too slow on cpu #293

Arslan-Mehmood1 commented Nov 17, 2024 •

edited

Loading

LarFii commented Nov 19, 2024

m-aliabbas commented Nov 23, 2024

Arslan-Mehmood1 commented Nov 23, 2024

openai ratelimit error and ollama too slow on cpu #293

openai ratelimit error and ollama too slow on cpu #293

Comments

Arslan-Mehmood1 commented Nov 17, 2024 • edited Loading

LarFii commented Nov 19, 2024

m-aliabbas commented Nov 23, 2024

Arslan-Mehmood1 commented Nov 23, 2024

Arslan-Mehmood1 commented Nov 17, 2024 •

edited

Loading