vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 4.7k
Star 30.9k

Code
Issues 1.6k
Pull requests 377
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: vllm-project/vllm

[Roadmap] vLLM Roadmap Q4 2024

#9006 opened Oct 1, 2024 by simon-mo

Open 23

vLLM's V1 Engine Architecture

#8779 opened Sep 24, 2024 by simon-mo

Open 9

Labels 56 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1,631 Open 3,756 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[Feature]: API for evicting all KV cache from GPU memory (or sleep mode) feature request

#10714 opened Nov 27, 2024 by HollowMan6

1 task done

[Usage]: vllm infer with 2 * Nvidia-L20, output repeat !!!! usage

How to use vllm

#10713 opened Nov 27, 2024 by RoyaltyLJW

1 task done

[Bug]: Unsloth bitsandbytes quantized model cannot be run due to: KeyError: 'layers.42.mlp.down_proj.weight.absmax bug

Something isn't working

#10710 opened Nov 27, 2024 by kerem-coemert

1 task done

[Performance]: [V1] Increasing the request batch size causes a significant drop in performance. performance

Performance-related issues

#10709 opened Nov 27, 2024 by lixiaolx

1 task done

[Misc]: nsys profile can not show CUDA HW on all devices misc

#10708 opened Nov 27, 2024 by irasin

1 task done

[Performance]: Unified flashattn kernel not outperforming current one performance

Performance-related issues

#10707 opened Nov 27, 2024 by NickLucche

1 task done

[Bug]: VLLM run very very slow in ARM cpu bug

Something isn't working

#10706 opened Nov 27, 2024 by feikiss

1 task done

[Bug]: load llama 70B more than 10min， is that right？ bug

Something isn't working

#10702 opened Nov 27, 2024 by ltm920716

1 task done

[Usage]: 4 Bit Finetuned Mistral Model usage

How to use vllm

#10697 opened Nov 27, 2024 by anandmahato

1 task done

[Feature]: frequency_penalties is missing in V1 feature request

#10696 opened Nov 27, 2024 by feifan-sun

1 task done

[Bug]: MambaCacheManager Can Possibly Run Out of Free Slots bug

Something isn't working

#10693 opened Nov 27, 2024 by fabianlim

1 task done

[Performance]: We need more processes to fully utilize the multi-core capabilities of the CPU. performance

Performance-related issues

#10690 opened Nov 27, 2024 by wciq1208

1 task done

[Bug]: CPU Docker build fail. bug

Something isn't working

#10689 opened Nov 27, 2024 by Zhenzhong1

1 task done

[Usage]: 怎么得到每一个输出token的logits，而不是softmax之后的logprob？ usage

How to use vllm

#10688 opened Nov 27, 2024 by TonyUSTC

1 task done

[Bug]: v0.6.4.post1 Qwen2-VL-7B-Instruct-AWQ crash：shape mismatch bug

Something isn't working

#10686 opened Nov 27, 2024 by wciq1208

1 task done

[Bug]: AssertError when testing VLLM performance with preempt mode set to "swap" and VLLM_TEST_ENABLE_ARTIFICIAL_PREEMPT=1 bug

Something isn't working

#10685 opened Nov 27, 2024 by Minamoto25

1 task done

[RFC]: Make any model an embedding model RFC

#10674 opened Nov 26, 2024 by DarkLight1337

1 task done

[Usage]: Llama-2-7b-chat-hf as embedding model usage

How to use vllm

#10673 opened Nov 26, 2024 by ra-MANUJ-an

1 task done

[Usage]: No Generation When Running VLLM with neuralmagic/Meta-Llama-3.1-8b-Instruct-quantized.w4a16 Using langchain_openai usage

How to use vllm

#10671 opened Nov 26, 2024 by ehab-akram

1 task done

[Usage]: how to get every output token score? usage

How to use vllm

#10670 opened Nov 26, 2024 by TonyUSTC

[Bug]: When using Ray as the inference backend for Qwen2-VL, there are issues with the inference results. bug

Something isn't working

#10668 opened Nov 26, 2024 by my17th2

1 task done

[RFC]: Create VllmState to save immutable args in VllmConfig RFC

#10666 opened Nov 26, 2024 by MengqingCao

1 task done

[Performance]: There is a 10x performance gap between the lora-modules deployment model and the Merge deployment model performance

Performance-related issues

#10664 opened Nov 26, 2024 by LIUKAI0815

1 task done

[Installation]: Request for a Solution to Enable Llama 3.1 405B-FP8 Model Compatibility with AMD Mi250 installation

Installation problems

#10663 opened Nov 26, 2024 by Bihan

1 task done

[Usage]: Cannot use xformers with old GPU usage

How to use vllm

#10662 opened Nov 26, 2024 by baimushan

1 task done

Previous 1 2 3 4 5 … 65 66 Next

Previous Next

ProTip! Updated in the last three days: updated:>2024-11-24.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly