-
-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Issues: vllm-project/vllm
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Feature]: API for evicting all KV cache from GPU memory (or
sleep mode
)
feature request
#10714
opened Nov 27, 2024 by
HollowMan6
1 task done
[Usage]: vllm infer with 2 * Nvidia-L20, output repeat !!!!
usage
How to use vllm
#10713
opened Nov 27, 2024 by
RoyaltyLJW
1 task done
[Bug]: Unsloth bitsandbytes quantized model cannot be run due to: Something isn't working
KeyError: 'layers.42.mlp.down_proj.weight.absmax
bug
#10710
opened Nov 27, 2024 by
kerem-coemert
1 task done
[Performance]: [V1] Increasing the request batch size causes a significant drop in performance.
performance
Performance-related issues
#10709
opened Nov 27, 2024 by
lixiaolx
1 task done
[Misc]: nsys profile can not show CUDA HW on all devices
misc
#10708
opened Nov 27, 2024 by
irasin
1 task done
[Performance]: Unified flashattn kernel not outperforming current one
performance
Performance-related issues
#10707
opened Nov 27, 2024 by
NickLucche
1 task done
[Bug]: VLLM run very very slow in ARM cpu
bug
Something isn't working
#10706
opened Nov 27, 2024 by
feikiss
1 task done
[Bug]: load llama 70B more than 10min, is that right?
bug
Something isn't working
#10702
opened Nov 27, 2024 by
ltm920716
1 task done
[Usage]: 4 Bit Finetuned Mistral Model
usage
How to use vllm
#10697
opened Nov 27, 2024 by
anandmahato
1 task done
[Feature]: frequency_penalties is missing in V1
feature request
#10696
opened Nov 27, 2024 by
feifan-sun
1 task done
[Bug]: MambaCacheManager Can Possibly Run Out of Free Slots
bug
Something isn't working
#10693
opened Nov 27, 2024 by
fabianlim
1 task done
[Performance]: We need more processes to fully utilize the multi-core capabilities of the CPU.
performance
Performance-related issues
#10690
opened Nov 27, 2024 by
wciq1208
1 task done
[Bug]: CPU Docker build fail.
bug
Something isn't working
#10689
opened Nov 27, 2024 by
Zhenzhong1
1 task done
[Usage]: 怎么得到每一个输出token的logits,而不是softmax之后的logprob?
usage
How to use vllm
#10688
opened Nov 27, 2024 by
TonyUSTC
1 task done
[Bug]: v0.6.4.post1 Qwen2-VL-7B-Instruct-AWQ crash:shape mismatch
bug
Something isn't working
#10686
opened Nov 27, 2024 by
wciq1208
1 task done
[Bug]: AssertError when testing VLLM performance with preempt mode set to "swap" and VLLM_TEST_ENABLE_ARTIFICIAL_PREEMPT=1
bug
Something isn't working
#10685
opened Nov 27, 2024 by
Minamoto25
1 task done
[RFC]: Make any model an embedding model
RFC
#10674
opened Nov 26, 2024 by
DarkLight1337
1 task done
[Usage]: Llama-2-7b-chat-hf as embedding model
usage
How to use vllm
#10673
opened Nov 26, 2024 by
ra-MANUJ-an
1 task done
[Usage]: No Generation When Running VLLM with neuralmagic/Meta-Llama-3.1-8b-Instruct-quantized.w4a16 Using langchain_openai
usage
How to use vllm
#10671
opened Nov 26, 2024 by
ehab-akram
1 task done
[Usage]: how to get every output token score?
usage
How to use vllm
#10670
opened Nov 26, 2024 by
TonyUSTC
[Bug]: When using Ray as the inference backend for Qwen2-VL, there are issues with the inference results.
bug
Something isn't working
#10668
opened Nov 26, 2024 by
my17th2
1 task done
[RFC]: Create
VllmState
to save immutable args in VllmConfig
RFC
#10666
opened Nov 26, 2024 by
MengqingCao
1 task done
[Performance]: There is a 10x performance gap between the lora-modules deployment model and the Merge deployment model
performance
Performance-related issues
#10664
opened Nov 26, 2024 by
LIUKAI0815
1 task done
[Installation]: Request for a Solution to Enable Llama 3.1 405B-FP8 Model Compatibility with AMD Mi250
installation
Installation problems
#10663
opened Nov 26, 2024 by
Bihan
1 task done
[Usage]: Cannot use xformers with old GPU
usage
How to use vllm
#10662
opened Nov 26, 2024 by
baimushan
1 task done
Previous Next
ProTip!
Updated in the last three days: updated:>2024-11-24.