[Bug] hang when many requests #1619

NiuBlibing · 2024-05-20T10:52:04Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.

Describe the bug

Like #1198, it will hang without setting session_id after many requests.
It seems hang in waiting self.get_generator(False, session_id).

Reproduction

lmdeploy serve api_server Qwen/Qwen1.5-72B-Chat/ --cache-max-entry-count 0.9 --tp 4 --session-len 32768
start benchmark([benchmark] optimize benchmark: counting tokenlizer tokens and error requests #1607) using 2048 concurrency without sending session_id and kill the benchmark scripy manually(ctrl+c)
repeat 2 some times

Environment

sys.platform: linux
Python: 3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0,1,2,3: NVIDIA A100-SXM4-80GB
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.3, V12.3.107
GCC: gcc (Debian 12.2.0-14) 12.2.0
PyTorch: 2.2.2+cu121
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.3.2 (Git Hash 2dc95a2ad0841e29db8b22fbccaf3e5da7992b01)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 12.1
  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
  - CuDNN 8.9.2
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.9.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.2.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, 

LMDeploy: 0.4.1+
transformers: 4.41.0
gradio: Not Found
fastapi: 0.111.0
pydantic: 2.7.1
triton: 2.2.0

Error traceback

ERROR:    Traceback (most recent call last):
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "uvloop/loop.pyx", line 1511, in uvloop.loop.Loop.run_until_complete
  File "uvloop/loop.pyx", line 1504, in uvloop.loop.Loop.run_until_complete
  File "uvloop/loop.pyx", line 1377, in uvloop.loop.Loop.run_forever
  File "uvloop/loop.pyx", line 555, in uvloop.loop.Loop._run
  File "uvloop/loop.pyx", line 474, in uvloop.loop.Loop._on_idle
  File "uvloop/cbhandles.pyx", line 83, in uvloop.loop.Handle._run
  File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/uvicorn/server.py", line 68, in serve
    with self.capture_signals():
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/contextlib.py", line 144, in __exit__
    next(self.gen)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/uvicorn/server.py", line 328, in capture_signals
    signal.raise_signal(captured_signal)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/asyncio/runners.py", line 157, in _on_sigint
    raise KeyboardInterrupt()
KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/routing.py", line 741, in lifespan
    await receive()
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/uvicorn/lifespan/on.py", line 137, in receive
    return await self.receive_queue.get()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/asyncio/queues.py", line 158, in get
    await getter
asyncio.exceptions.CancelledError

ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "uvloop/loop.pyx", line 1511, in uvloop.loop.Loop.run_until_complete
  File "uvloop/loop.pyx", line 1504, in uvloop.loop.Loop.run_until_complete
  File "uvloop/loop.pyx", line 1377, in uvloop.loop.Loop.run_forever
  File "uvloop/loop.pyx", line 555, in uvloop.loop.Loop._run
  File "uvloop/loop.pyx", line 474, in uvloop.loop.Loop._on_idle
  File "uvloop/cbhandles.pyx", line 83, in uvloop.loop.Handle._run
  File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/uvicorn/server.py", line 68, in serve
    with self.capture_signals():
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/contextlib.py", line 144, in __exit__
    next(self.gen)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/uvicorn/server.py", line 328, in capture_signals
    signal.raise_signal(captured_signal)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/asyncio/runners.py", line 157, in _on_sigint
    raise KeyboardInterrupt()
KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__
    await self.app(scope, receive, send)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/routing.py", line 756, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/routing.py", line 776, in app
    await route.handle(scope, receive, send)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
    await self.app(scope, receive, send)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/routing.py", line 72, in app
    response = await func(request)
               ^^^^^^^^^^^^^^^^^^^
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/fastapi/routing.py", line 278, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/lmdeploy/serve/openai/api_server.py", line 489, in chat_completions_v1
    async for res in result_generator:
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/lmdeploy/serve/async_engine.py", line 615, in generate
    generator = await self.get_generator(False, session_id)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/lmdeploy/serve/async_engine.py", line 369, in get_generator
    await asyncio.sleep(0.1)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/asyncio/tasks.py", line 649, in sleep
    return await future
           ^^^^^^^^^^^^
asyncio.exceptions.CancelledError
INFO:     10.18.200.194:57340 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "uvloop/loop.pyx", line 1511, in uvloop.loop.Loop.run_until_complete
  File "uvloop/loop.pyx", line 1504, in uvloop.loop.Loop.run_until_complete
  File "uvloop/loop.pyx", line 1377, in uvloop.loop.Loop.run_forever
  File "uvloop/loop.pyx", line 555, in uvloop.loop.Loop._run
  File "uvloop/loop.pyx", line 474, in uvloop.loop.Loop._on_idle
  File "uvloop/cbhandles.pyx", line 83, in uvloop.loop.Handle._run
  File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/uvicorn/server.py", line 68, in serve
    with self.capture_signals():
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/contextlib.py", line 144, in __exit__
    next(self.gen)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/uvicorn/server.py", line 328, in capture_signals
    signal.raise_signal(captured_signal)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/asyncio/runners.py", line 157, in _on_sigint
    raise KeyboardInterrupt()
KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__
    await self.app(scope, receive, send)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/routing.py", line 756, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/routing.py", line 776, in app
    await route.handle(scope, receive, send)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
    await self.app(scope, receive, send)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/starlette/routing.py", line 72, in app
    response = await func(request)
               ^^^^^^^^^^^^^^^^^^^
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/fastapi/routing.py", line 278, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/lmdeploy/serve/openai/api_server.py", line 489, in chat_completions_v1
    async for res in result_generator:
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/lmdeploy/serve/async_engine.py", line 615, in generate
    generator = await self.get_generator(False, session_id)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/lmdeploy/serve/async_engine.py", line 369, in get_generator
    await asyncio.sleep(0.1)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/asyncio/tasks.py", line 649, in sleep
    return await future
           ^^^^^^^^^^^^
asyncio.exceptions.CancelledError
Traceback (most recent call last):
  File "/home/internlm/.conda/envs/lmdeploy/bin/lmdeploy", line 8, in <module>
    sys.exit(run())
             ^^^^^
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/lmdeploy/cli/entrypoint.py", line 37, in run
    args.run(args)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/lmdeploy/cli/serve.py", line 283, in api_server
    run_api_server(args.model_path,
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/lmdeploy/serve/openai/api_server.py", line 1222, in serve
    uvicorn.run(app=app,
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/uvicorn/main.py", line 575, in run
    server.run()
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/uvicorn/server.py", line 65, in run
    return asyncio.run(self.serve(sockets=sockets))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "uvloop/loop.pyx", line 1511, in uvloop.loop.Loop.run_until_complete
  File "uvloop/loop.pyx", line 1504, in uvloop.loop.Loop.run_until_complete
  File "uvloop/loop.pyx", line 1377, in uvloop.loop.Loop.run_forever
  File "uvloop/loop.pyx", line 555, in uvloop.loop.Loop._run
  File "uvloop/loop.pyx", line 474, in uvloop.loop.Loop._on_idle
  File "uvloop/cbhandles.pyx", line 83, in uvloop.loop.Handle._run
  File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/uvicorn/server.py", line 68, in serve
    with self.capture_signals():
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/contextlib.py", line 144, in __exit__
    next(self.gen)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/site-packages/uvicorn/server.py", line 328, in capture_signals
    signal.raise_signal(captured_signal)
  File "/home/internlm/.conda/envs/lmdeploy/lib/python3.11/asyncio/runners.py", line 157, in _on_sigint
    raise KeyboardInterrupt()
KeyboardInterrupt

The text was updated successfully, but these errors were encountered:

AllentDan · 2024-06-06T02:28:54Z

Tried with lmdeploy serve api_server Qwen1.5-110B-Chat --cache-max-entry-count 0.9 --tp 8 --session-len 32768 and did not reproduce yet.

AllentDan · 2024-06-20T10:00:01Z

#1789 might resolve the issue. Please give it a try.

DefTruth · 2024-06-27T01:32:48Z

i have encounter the same error for offline inference when the [BS] is large, the inference process will hang sometimes. device 0 is 0% GPU-Util but device 1 is always 100% GPU-Util.

Thu Jun 27 01:39:23 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.07             Driver Version: 535.161.07   CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA L20                     On  | 00000000:87:00.0 Off |                    0 |
| N/A   48C    P0             100W / 350W |      0MiB / 46068MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA L20                     On  | 00000000:88:00.0 Off |                    0 |
| N/A   48C    P0              94W / 350W |      0MiB / 46068MiB |      100%    Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|

@AllentDan 您好，这个问题看到类似的issue，请问下目前有办法解决吗。测试发现在驱动550没有这个问题，但是在535有这个问题，我们必须在535跑。

AllentDan · 2024-06-27T01:59:51Z

这个问题main已经修掉了，#1848 (comment) 。你遇到的是 turbomind 还是 pytorch engine的问题？100% 只是显示问题，实际都没有在跑

DefTruth · 2024-06-27T02:01:49Z

这个问题main已经修掉了，#1848 (comment) 。你遇到的是 turbomind 还是 pytorch engine的问题？100% 只是显示问题，实际都没有在跑

用的是turbomind在跑internvl，但是看了下实现，vit是跑在torch上的？llm是跑在tm上

DefTruth · 2024-06-27T02:02:32Z

这个问题main已经修掉了，#1848 (comment) 。你遇到的是 turbomind 还是 pytorch engine的问题？100% 只是显示问题，实际都没有在跑

感谢回复，我测试一下

AllentDan · 2024-06-27T02:07:57Z

跟你遇到的应该不是一个问题。有什么简单的复现脚本吗？驱动 535 好像是有点问题

DefTruth · 2024-06-27T04:07:47Z

跟你遇到的应该不是一个问题。有什么简单的复现脚本吗？驱动 535 好像是有点问题

编译了最新的lmdeploy，没有hang住的问题了

AllentDan · 2024-06-27T05:04:42Z

没问题，这个issue关掉了

DefTruth · 2024-06-27T06:50:03Z

@AllentDan 测试了很多次，发现还是会偶发hang住，只是偶发概率更低了，大概1/25，InternVL 1.5 BS=16, driver 535, offline inference，L20X2。而且总是第一个batch就hang住.

lvhan028 assigned AllentDan May 29, 2024

medwang1 mentioned this issue Jun 12, 2024

[Bug] detokenize_incrementally: OverflowError: out of range integral type conversion attempted #1739

Open

2 tasks

AllentDan closed this as completed Jun 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] hang when many requests #1619

[Bug] hang when many requests #1619

NiuBlibing commented May 20, 2024 •

edited

Loading

AllentDan commented Jun 6, 2024

AllentDan commented Jun 20, 2024

DefTruth commented Jun 27, 2024 •

edited

Loading

AllentDan commented Jun 27, 2024

DefTruth commented Jun 27, 2024

DefTruth commented Jun 27, 2024

AllentDan commented Jun 27, 2024

DefTruth commented Jun 27, 2024

AllentDan commented Jun 27, 2024

DefTruth commented Jun 27, 2024 •

edited

Loading

[Bug] hang when many requests #1619

[Bug] hang when many requests #1619

Comments

NiuBlibing commented May 20, 2024 • edited Loading

Checklist

Describe the bug

Reproduction

Environment

Error traceback

AllentDan commented Jun 6, 2024

AllentDan commented Jun 20, 2024

DefTruth commented Jun 27, 2024 • edited Loading

AllentDan commented Jun 27, 2024

DefTruth commented Jun 27, 2024

DefTruth commented Jun 27, 2024

AllentDan commented Jun 27, 2024

DefTruth commented Jun 27, 2024

AllentDan commented Jun 27, 2024

DefTruth commented Jun 27, 2024 • edited Loading

NiuBlibing commented May 20, 2024 •

edited

Loading

DefTruth commented Jun 27, 2024 •

edited

Loading

DefTruth commented Jun 27, 2024 •

edited

Loading