-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] hang when many requests #1619
Comments
Tried with |
#1789 might resolve the issue. Please give it a try. |
i have encounter the same error for offline inference when the [BS] is large, the inference process will hang sometimes. device 0 is 0% GPU-Util but device 1 is always 100% GPU-Util. Thu Jun 27 01:39:23 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.07 Driver Version: 535.161.07 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA L20 On | 00000000:87:00.0 Off | 0 |
| N/A 48C P0 100W / 350W | 0MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA L20 On | 00000000:88:00.0 Off | 0 |
| N/A 48C P0 94W / 350W | 0MiB / 46068MiB | 100% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================| @AllentDan 您好,这个问题看到类似的issue,请问下目前有办法解决吗。测试发现在驱动550没有这个问题,但是在535有这个问题,我们必须在535跑。 |
这个问题main已经修掉了,#1848 (comment) 。你遇到的是 turbomind 还是 pytorch engine的问题?100% 只是显示问题,实际都没有在跑 |
用的是turbomind在跑internvl,但是看了下实现,vit是跑在torch上的?llm是跑在tm上 |
感谢回复,我测试一下 |
跟你遇到的应该不是一个问题。有什么简单的复现脚本吗?驱动 535 好像是有点问题 |
编译了最新的lmdeploy,没有hang住的问题了 |
没问题,这个issue关掉了 |
@AllentDan 测试了很多次,发现还是会偶发hang住,只是偶发概率更低了,大概1/25,InternVL 1.5 BS=16, driver 535, offline inference,L20X2。而且总是第一个batch就hang住. |
Checklist
Describe the bug
Like #1198, it will hang without setting session_id after many requests.
It seems hang in waiting
self.get_generator(False, session_id)
.Reproduction
Environment
Error traceback
The text was updated successfully, but these errors were encountered: