Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] hang when many requests #1619

Closed
2 tasks done
NiuBlibing opened this issue May 20, 2024 · 10 comments
Closed
2 tasks done

[Bug] hang when many requests #1619

NiuBlibing opened this issue May 20, 2024 · 10 comments
Assignees

Comments

@NiuBlibing
Copy link
Contributor

NiuBlibing commented May 20, 2024

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.

Describe the bug

Like #1198, it will hang without setting session_id after many requests.
It seems hang in waiting self.get_generator(False, session_id).

Reproduction

  1. lmdeploy serve api_server Qwen/Qwen1.5-72B-Chat/ --cache-max-entry-count 0.9 --tp 4 --session-len 32768
  2. start benchmark([benchmark] optimize benchmark: counting tokenlizer tokens and error requests #1607) using 2048 concurrency without sending session_id and kill the benchmark scripy manually(ctrl+c)
  3. repeat 2 some times

Environment

sys.platform: linux
Python: 3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0,1,2,3: NVIDIA A100-SXM4-80GB
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.3, V12.3.107
GCC: gcc (Debian 12.2.0-14) 12.2.0
PyTorch: 2.2.2+cu121
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.3.2 (Git Hash 2dc95a2ad0841e29db8b22fbccaf3e5da7992b01)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 12.1
  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
  - CuDNN 8.9.2
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.9.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.2.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, 

LMDeploy: 0.4.1+
transformers: 4.41.0
gradio: Not Found
fastapi: 0.111.0
pydantic: 2.7.1
triton: 2.2.0

Error traceback

ERROR:    Traceback (most recent call last):
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "uvloop/loop.pyx", line 1511, in uvloop.loop.Loop.run_until_complete
  File "uvloop/loop.pyx", line 1504, in uvloop.loop.Loop.run_until_complete
  File "uvloop/loop.pyx", line 1377, in uvloop.loop.Loop.run_forever
  File "uvloop/loop.pyx", line 555, in uvloop.loop.Loop._run
  File "uvloop/loop.pyx", line 474, in uvloop.loop.Loop._on_idle
  File "uvloop/cbhandles.pyx", line 83, in uvloop.loop.Handle._run
  File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/uvicorn/server.py", line 68, in serve
    with self.capture_signals():
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/contextlib.py", line 144, in __exit__
    next(self.gen)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/uvicorn/server.py", line 328, in capture_signals
    signal.raise_signal(captured_signal)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/asyncio/runners.py", line 157, in _on_sigint
    raise KeyboardInterrupt()
KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/routing.py", line 741, in lifespan
    await receive()
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/uvicorn/lifespan/on.py", line 137, in receive
    return await self.receive_queue.get()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/asyncio/queues.py", line 158, in get
    await getter
asyncio.exceptions.CancelledError

ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "uvloop/loop.pyx", line 1511, in uvloop.loop.Loop.run_until_complete
  File "uvloop/loop.pyx", line 1504, in uvloop.loop.Loop.run_until_complete
  File "uvloop/loop.pyx", line 1377, in uvloop.loop.Loop.run_forever
  File "uvloop/loop.pyx", line 555, in uvloop.loop.Loop._run
  File "uvloop/loop.pyx", line 474, in uvloop.loop.Loop._on_idle
  File "uvloop/cbhandles.pyx", line 83, in uvloop.loop.Handle._run
  File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/uvicorn/server.py", line 68, in serve
    with self.capture_signals():
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/contextlib.py", line 144, in __exit__
    next(self.gen)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/uvicorn/server.py", line 328, in capture_signals
    signal.raise_signal(captured_signal)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/asyncio/runners.py", line 157, in _on_sigint
    raise KeyboardInterrupt()
KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__
    await self.app(scope, receive, send)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/routing.py", line 756, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/routing.py", line 776, in app
    await route.handle(scope, receive, send)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
    await self.app(scope, receive, send)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/routing.py", line 72, in app
    response = await func(request)
               ^^^^^^^^^^^^^^^^^^^
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/fastapi/routing.py", line 278, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/lmdeploy/serve/openai/api_server.py", line 489, in chat_completions_v1
    async for res in result_generator:
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/lmdeploy/serve/async_engine.py", line 615, in generate
    generator = await self.get_generator(False, session_id)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/lmdeploy/serve/async_engine.py", line 369, in get_generator
    await asyncio.sleep(0.1)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/asyncio/tasks.py", line 649, in sleep
    return await future
           ^^^^^^^^^^^^
asyncio.exceptions.CancelledError
INFO:     10.18.200.194:57340 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "uvloop/loop.pyx", line 1511, in uvloop.loop.Loop.run_until_complete
  File "uvloop/loop.pyx", line 1504, in uvloop.loop.Loop.run_until_complete
  File "uvloop/loop.pyx", line 1377, in uvloop.loop.Loop.run_forever
  File "uvloop/loop.pyx", line 555, in uvloop.loop.Loop._run
  File "uvloop/loop.pyx", line 474, in uvloop.loop.Loop._on_idle
  File "uvloop/cbhandles.pyx", line 83, in uvloop.loop.Handle._run
  File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/uvicorn/server.py", line 68, in serve
    with self.capture_signals():
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/contextlib.py", line 144, in __exit__
    next(self.gen)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/uvicorn/server.py", line 328, in capture_signals
    signal.raise_signal(captured_signal)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/asyncio/runners.py", line 157, in _on_sigint
    raise KeyboardInterrupt()
KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__
    await self.app(scope, receive, send)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/routing.py", line 756, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/routing.py", line 776, in app
    await route.handle(scope, receive, send)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
    await self.app(scope, receive, send)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/routing.py", line 72, in app
    response = await func(request)
               ^^^^^^^^^^^^^^^^^^^
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/fastapi/routing.py", line 278, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/lmdeploy/serve/openai/api_server.py", line 489, in chat_completions_v1
    async for res in result_generator:
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/lmdeploy/serve/async_engine.py", line 615, in generate
    generator = await self.get_generator(False, session_id)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/lmdeploy/serve/async_engine.py", line 369, in get_generator
    await asyncio.sleep(0.1)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/asyncio/tasks.py", line 649, in sleep
    return await future
           ^^^^^^^^^^^^
asyncio.exceptions.CancelledError
Traceback (most recent call last):
  File "/home/internlm/.conda/envs/lmdeploy/bin/lmdeploy", line 8, in <module>
    sys.exit(run())
             ^^^^^
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/lmdeploy/cli/entrypoint.py", line 37, in run
    args.run(args)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/lmdeploy/cli/serve.py", line 283, in api_server
    run_api_server(args.model_path,
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/lmdeploy/serve/openai/api_server.py", line 1222, in serve
    uvicorn.run(app=app,
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/uvicorn/main.py", line 575, in run
    server.run()
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/uvicorn/server.py", line 65, in run
    return asyncio.run(self.serve(sockets=sockets))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "uvloop/loop.pyx", line 1511, in uvloop.loop.Loop.run_until_complete
  File "uvloop/loop.pyx", line 1504, in uvloop.loop.Loop.run_until_complete
  File "uvloop/loop.pyx", line 1377, in uvloop.loop.Loop.run_forever
  File "uvloop/loop.pyx", line 555, in uvloop.loop.Loop._run
  File "uvloop/loop.pyx", line 474, in uvloop.loop.Loop._on_idle
  File "uvloop/cbhandles.pyx", line 83, in uvloop.loop.Handle._run
  File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/uvicorn/server.py", line 68, in serve
    with self.capture_signals():
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/contextlib.py", line 144, in __exit__
    next(self.gen)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/uvicorn/server.py", line 328, in capture_signals
    signal.raise_signal(captured_signal)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/asyncio/runners.py", line 157, in _on_sigint
    raise KeyboardInterrupt()
KeyboardInterrupt
@AllentDan
Copy link
Collaborator

Tried with lmdeploy serve api_server Qwen1.5-110B-Chat --cache-max-entry-count 0.9 --tp 8 --session-len 32768 and did not reproduce yet.

@AllentDan
Copy link
Collaborator

#1789 might resolve the issue. Please give it a try.

@DefTruth
Copy link
Contributor

DefTruth commented Jun 27, 2024

i have encounter the same error for offline inference when the [BS] is large, the inference process will hang sometimes. device 0 is 0% GPU-Util but device 1 is always 100% GPU-Util.

Thu Jun 27 01:39:23 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.07             Driver Version: 535.161.07   CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA L20                     On  | 00000000:87:00.0 Off |                    0 |
| N/A   48C    P0             100W / 350W |      0MiB / 46068MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA L20                     On  | 00000000:88:00.0 Off |                    0 |
| N/A   48C    P0              94W / 350W |      0MiB / 46068MiB |      100%    Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|

@AllentDan 您好,这个问题看到类似的issue,请问下目前有办法解决吗。测试发现在驱动550没有这个问题,但是在535有这个问题,我们必须在535跑。

@AllentDan
Copy link
Collaborator

这个问题main已经修掉了,#1848 (comment) 。你遇到的是 turbomind 还是 pytorch engine的问题?100% 只是显示问题,实际都没有在跑

@DefTruth
Copy link
Contributor

这个问题main已经修掉了,#1848 (comment) 。你遇到的是 turbomind 还是 pytorch engine的问题?100% 只是显示问题,实际都没有在跑

用的是turbomind在跑internvl,但是看了下实现,vit是跑在torch上的?llm是跑在tm上

@DefTruth
Copy link
Contributor

这个问题main已经修掉了,#1848 (comment) 。你遇到的是 turbomind 还是 pytorch engine的问题?100% 只是显示问题,实际都没有在跑

感谢回复,我测试一下

@AllentDan
Copy link
Collaborator

跟你遇到的应该不是一个问题。有什么简单的复现脚本吗?驱动 535 好像是有点问题

@DefTruth
Copy link
Contributor

跟你遇到的应该不是一个问题。有什么简单的复现脚本吗?驱动 535 好像是有点问题

编译了最新的lmdeploy,没有hang住的问题了

@AllentDan
Copy link
Collaborator

没问题,这个issue关掉了

@DefTruth
Copy link
Contributor

DefTruth commented Jun 27, 2024

@AllentDan 测试了很多次,发现还是会偶发hang住,只是偶发概率更低了,大概1/25,InternVL 1.5 BS=16, driver 535, offline inference,L20X2。而且总是第一个batch就hang住.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants