LMDeploy Release V0.6.1
What's Changed
🚀 Features
- Support user-sepcified data type by @lvhan028 in #2473
- Support minicpm3-4b by @AllentDan in #2465
- support Qwen2-VL with pytorch backend by @irexyc in #2449
💥 Improvements
- Add silu mul kernel by @grimoire in #2469
- adjust schedule to improve TTFT in pytorch engine by @grimoire in #2477
- Add max_log_len option to control length of printed log by @lvhan028 in #2478
- set served model name being repo_id from hub before it is downloaded by @lvhan028 in #2494
- Improve proxy server usage by @AllentDan in #2488
- CudaGraph mixin by @grimoire in #2485
- pytorch engine add get_logits by @grimoire in #2487
- Refactor lora by @grimoire in #2466
- support noaligned silu_and_mul by @grimoire in #2506
- optimize performance of ascend backend's update_step_context() by calculating kv_start_indices in a new way by @jiajie-yang in #2521
- Fix chatglm tokenizer failed when transformers>=4.45.0 by @AllentDan in #2520
🐞 Bug fixes
- Fix "TypeError: Got unsupported ScalarType BFloat16" by @SeitaroShinagawa in #2472
- fix ascend atten_mask by @yao-fengchen in #2483
- Catch exceptions thrown by turbomind inference thread by @lvhan028 in #2502
- The
get_ppl
missed the last token of each iteration during multi-iter prefill by @lvhan028 in #2499 - fix vl gradio by @irexyc in #2527
🌐 Other
- [ci] regular update by @zhulinJulia24 in #2431
- [CI] add base model evaluation by @zhulinJulia24 in #2490
- bump version to v0.6.1 by @lvhan028 in #2513
New Contributors
- @SeitaroShinagawa made their first contribution in #2472
Full Changelog: v0.6.0...v0.6.1