-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] detokenize_incrementally
: OverflowError: out of range integral type conversion attempted
#1739
Comments
Please provide the code for the client that can be used for reproduction, thanks. |
I will do my best to get a reliable reproduction of this detokenize issue. Possibly not related, since there doesn't seem to be any tokenizer issues in the logs, but maybe worth referencing since it has the same "an illegal memory access" message: |
@zhyncs 可以这样复现:
wrk.method = "POST"
wrk.body = [[
{
"model": "yi",
"temperature": 0.7,
"messages": [
{
"role": "user",
"content": "worker_rlimit_nofile 是一个在 Nginx 或其他基于 Unix-like 系统的 Web 服务器配置中的指令,用于设置工作进程可以打开的最大文件描述符数。这个设置对于服务器性能有重要影响,因为它决定了服务器可以同时处理多少个并发连接。在这里,655350 是设置的具体数值。这个数值设置的相当高,意味着服务器配置了非常高的并发处理能力。在 Unix-like 系统中,文件描述符用于访问所有类型的文件,包括网络套接字。因此,增加这个限制可以让服务器处理更多的并发请求,特别是对于需要处理大量静态文件或者提供大量 Web 服务的场景。设置这个值通常需要服务器管理员有适当的权限,并且可能需要在系统级别进行相应的调整,因为操作系统也有自己的限制。在实际应用中,服务器管理员需要根据服务器的硬件资源、预期的负载以及实际的应用场景来合理设置这个值,以确保服务器既能充分利用资源,又不会因为超过系统限制而导致性能问题。"
}
],
"stream": false,
"max_tokens": 0
}]]
wrk.headers["Content-Type"] = "application/json" 部署模型的模型是: |
感觉这几个是一个问题,content 内容很长,开的并发很高,就会触发这个问题 |
@lzhangzz could you please investigate this issue? |
Checklist
Describe the bug
Most API requests are successful, but the error in the title randomly occurs sometimes. I haven't worked out how to reliably reproduce it yet. I'm not doing anything special - just loading a Llama 2 70B AWQ model on that I created with
lmdeploy lite auto_awq
on a dual 4090 machine.The same model works fine in vLLM.
Reproduction
Seems to occur randomly when many concurrent requests are sent. I will update this issue if I find any way to reproduce it consistently. Here are the arguments:
I'm using the
/v1/completions
endpoint (not the chat completion endpoint). The tokenizer class isLlamaTokenizer
.I'm hoping that the error log below sparks an idea at what the cause could be 🤞
Environment
Latest official Docker image:
openmmlab/lmdeploy:v0.4.2
Error traceback
The text was updated successfully, but these errors were encountered: