LMDeploy Release V0.6.1

lvhan028 released this 28 Sep 11:34

· 108 commits to main since this release

What's Changed

🚀 Features

Support user-sepcified data type by @lvhan028 in #2473
Support minicpm3-4b by @AllentDan in #2465
support Qwen2-VL with pytorch backend by @irexyc in #2449

💥 Improvements

Add silu mul kernel by @grimoire in #2469
adjust schedule to improve TTFT in pytorch engine by @grimoire in #2477
Add max_log_len option to control length of printed log by @lvhan028 in #2478
set served model name being repo_id from hub before it is downloaded by @lvhan028 in #2494
Improve proxy server usage by @AllentDan in #2488
CudaGraph mixin by @grimoire in #2485
pytorch engine add get_logits by @grimoire in #2487
Refactor lora by @grimoire in #2466
support noaligned silu_and_mul by @grimoire in #2506
optimize performance of ascend backend's update_step_context() by calculating kv_start_indices in a new way by @jiajie-yang in #2521
Fix chatglm tokenizer failed when transformers>=4.45.0 by @AllentDan in #2520

🐞 Bug fixes

Fix "TypeError: Got unsupported ScalarType BFloat16" by @SeitaroShinagawa in #2472
fix ascend atten_mask by @yao-fengchen in #2483
Catch exceptions thrown by turbomind inference thread by @lvhan028 in #2502
The get_ppl missed the last token of each iteration during multi-iter prefill by @lvhan028 in #2499
fix vl gradio by @irexyc in #2527

🌐 Other

[ci] regular update by @zhulinJulia24 in #2431
[CI] add base model evaluation by @zhulinJulia24 in #2490
bump version to v0.6.1 by @lvhan028 in #2513

New Contributors

@SeitaroShinagawa made their first contribution in #2472

Full Changelog: v0.6.0...v0.6.1

Contributors

grimoire, lvhan028, and 6 other contributors

Assets 12