Releases · InternLM/lmdeploy

26 Sep 12:52

lvhan028

v0.0.10

b58a9df

LMDeploy Release V0.0.10

What's Changed

💥 Improvements

[feature] Graceful termination of background threads in LlamaV2 by @akhoroshev in #458
expose stop words and filter eoa by @AllentDan in #352

🐞 Bug fixes

Fix side effect brought by supporting codellama: sequence_start is always true when calling model.get_prompt by @lvhan028 in #466
Miss meta instruction of internlm-chat model by @lvhan028 in #470
[bug] Fix race condition by @akhoroshev in #460
Fix compatibility issues with Pydantic 2 by @aisensiy in #465
fix benchmark serving cannot use Qwen tokenizer by @AllentDan in #443
Fix memory leak by @lvhan028 in #488

📚 Documentations

Fix typo in README.md by @eltociear in #462

🌐 Other

bump version to v0.0.10 by @lvhan028 in #474

New Contributors

@eltociear made their first contribution in #462
@akhoroshev made their first contribution in #458
@aisensiy made their first contribution in #465

Full Changelog: v0.0.9...v0.0.10

Contributors

aisensiy, lvhan028, and 3 other contributors

Assets 2

20 Sep 08:10

lvhan028

v0.0.9

0be9e7a

LMDeploy Release V0.0.9

Highlight

Support InternLM 20B, including FP16, W4A16, and W4KV8

What's Changed

🚀 Features

Support InternLM 20B by @lvhan028 in #440

💥 Improvements

Reduce gil switching by @irexyc in #407
Profile token generation with more settings by @AllentDan in #364

🐞 Bug fixes

Fix disk space limit for building docker image by @RunningLeon in #404
more general pypi ci by @irexyc in #412
Fix build.md by @pangsg in #411
Fix memory leak by @irexyc in #415
Fix token count bug by @AllentDan in #416
[Fix] Support actual seqlen in flash-attention2 by @grimoire in #418
[Fix] output[-1] when output is empty by @wangruohui in #405

🌐 Other

rename readthedocs config file by @RunningLeon in #429
bump version to v0.0.9 by @lvhan028 in #428

New Contributors

@pangsg made their first contribution in #411

Full Changelog: v0.0.8...v0.0.9

Contributors

grimoire, lvhan028, and 5 other contributors

Assets 2

11 Sep 15:34

lvhan028

v0.0.8

450757b

LMDeploy Release V0.0.8

Highlight

Support Baichuan2-7B-Base and Baichuan2-7B-Chat
Support all features of Code Llama: code completion, infilling, chat / instruct, and python specialist

What's Changed

🚀 Features

Support baichuan2-chat chat template by @wangruohui in #378
Support codellama by @lvhan028 in #359

🐞 Bug fixes

[Fix] when using stream is False, continuous batching doesn't work by @sleepwalker2017 in #346
[Fix] Set max dynamic smem size for decoder MHA to support context length > 8k by @lvhan028 in #377
Fix exceed session len core dump for chat and generate by @AllentDan in #366
[Fix] update puyu model by @Harold-lkk in #399

📚 Documentations

[Docs] Fix quantization docs link by @LZHgrla in #367
[Docs] Simplify build.md by @pppppM in #370
[Docs] Update lmdeploy logo by @lvhan028 in #372

New Contributors

@sleepwalker2017 made their first contribution in #346

Full Changelog: v0.0.7...v0.0.8

Contributors

lvhan028, wangruohui, and 5 other contributors

Assets 2

04 Sep 06:39

lvhan028

v0.0.7

d065f3e

LMDeploy Release V0.0.7

Highlights

Flash attention 2 is supported, boosting context decoding speed by approximately 45%
Token_id decoding has been optimized for better efficiency
The gemm-tunned script has been packed in the PyPI package

What's Changed

🚀 Features

Add flashattention2 by @grimoire in #196

💥 Improvements

add llama_gemm to wheel by @irexyc in #320
Decode generated token_ids incrementally by @AllentDan in #309

🐞 Bug fixes

Fix turbomind import error on windows by @irexyc in #316
Fix profile_serving hung issue by @lvhan028 in #344

📚 Documentations

Fix readthedocs building by @RunningLeon in #321
fix(kvint8): update doc by @tpoisonooo in #315
Update FAQ for restful api by @AllentDan in #319

Full Changelog: v0.0.6...v0.0.7

Contributors

grimoire, lvhan028, and 4 other contributors

Assets 2

25 Aug 13:30

lvhan028

v0.0.6

cfabbbd

LMDeploy Release V0.0.6

Highlights

Support Qwen-7B with dynamic NTK scaling and logN scaling in turbomind
Support tensor parallelism for W4A16
Add OpenAI-like RESTful API
Support Llama-2 70B 4-bit quantization

What's Changed

🚀 Features

Profiling tool for huggingface and deepspeed models by @wangruohui in #161
Support windows platform by @irexyc in #209
Qwen-7B, dynamic NTK scaling and logN scaling support in turbomind by @lzhangzz in #230
Add Restful API by @AllentDan in #223
Support context decoding with DP in pytorch by @wangruohui in #193

💥 Improvements

Support TP for W4A16 by @lzhangzz in #262
Pass chat template args including meta_prompt to model(7785142) by @AllentDan in #225
Enable the Gradio server to call inference services through the RESTful API by @AllentDan in #287

🐞 Bug fixes

Adjust dependency of gradio server by @AllentDan in #236
Implement movmatrix using warp shuffling for CUDA < 11.8 by @lzhangzz in #267
Add 'accelerate' to requirement list by @lvhan028 in #261
Fix building with CUDA 11.3 by @lzhangzz in #280
Pad tok_embedding and output weights to make their shape divisible by TP by @lvhan028 in #285
Fix llama2 70b & qwen quantization error by @pppppM in #273
Import turbomind in gradio server only when it is needed by @AllentDan in #303

📚 Documentations

Remove specified version in user guide by @lvhan028 in #241
docs(quantzation): update description by @tpoisonooo in #253 and #272
Check-in FAQ by @lvhan028 in #256
add readthedocs by @RunningLeon in #208

🌐 Other

Update workflow for building docker image by @RunningLeon in #282
Change to github-hosted runner for building docker image by @RunningLeon in #291

Known issues

4-bit Qwen-7b model inference failed. #307 is addressing this issue.

Full Changelog: v0.0.5...v0.0.6

Contributors

lvhan028, tpoisonooo, and 6 other contributors

Assets 2

15 Aug 07:40

lvhan028

v0.0.5

271a19f

LMDeploy Release V0.0.5

What's Changed

🐞 Bug fixes

Fix wrong RPATH using the absolute path instead of relative one by @irexyc in #239

Full Changelog: v0.0.4...v0.0.5

Contributors

irexyc

Assets 2

14 Aug 11:35

lvhan028

v0.0.4

8cdcb2a

LMDeploy Release V0.0.4

Highlight

Support 4-bit LLM quantization and inference. Check this guide for detailed information.

What's Changed

🚀 Features

Blazing fast W4A16 inference by @lzhangzz in #202
Support AWQ by @pppppM in #108 and @AllentDan in #228

💥 Improvements

Add release note template by @lvhan028 in #211
feat(quantization): kv cache use asymmetric by @tpoisonooo in #218

🐞 Bug fixes

Fix TIS client got-no-space-result side effect brought by PR #197 by @lvhan028 in #222

📚 Documentations

Update W4A16 News by @pppppM in #227
Check-in user guide for w4a16 LLM deployment by @lvhan028 in #224

Full Changelog: v0.0.3...v0.0.4

Contributors

lvhan028, tpoisonooo, and 3 other contributors

Assets 2

09 Aug 09:55

lvhan028

v0.0.3

4bd0b48

LMDeploy Release V0.0.3

What's Changed

🚀 Features

Support tensor parallelism without offline splitting model weights by @grimoire in #158
Add script to split HuggingFace model to the smallest sharded checkpoints by @LZHgrla in #199
Add non-stream inference api for chatbot by @lvhan028 in #200

💥 Improvements

Add issue/pr templates by @lvhan028 in #184
Remove unused code to reduce binary size by @lzhangzz in #181
Support serving with gradio without communicating to TIS by @AllentDan in #162
Improve postprocessing in TIS serving by applying Incremental de-tokenizing by @lvhan028 in #197
Support multi-session chat by @wangruohui in #178

🐞 Bug fixes

Fix build test error and move turbmind csrc test cases to tests/csrc by @lvhan028 in #188
Fix launching client error by moving lmdeploy/turbomind/utils.py to lmdeploy/utils.py by @lvhan028 in #191

📚 Documentations

Update README.md by @tpoisonooo in #187
Translate turbomind.md by @xin-li-67 in #173

New Contributors

@LZHgrla made their first contribution in #199

Full Changelog: v0.0.2...v0.0.3

Contributors

grimoire, lvhan028, and 6 other contributors

Assets 2

28 Jul 07:11

lvhan028

v0.0.2

7e0b75b

LMDeploy Release V0.0.2

What's Changed

🚀 Features

Add lmdeploy python package built scripts and CI workflow by @irexyc in #163, #164, #170
Support LLama-2 with GQA by @lzhangzz in #147 and @grimoire in #160
Add Llama-2 chat template by @grimoire in #140
Add decode-only forward pass by @lzhangzz in #153
Support tensor parallelism in turbomind's python API by @grimoire #82
Support w pack qkv by @tpoisonooo in #83

💥 Improvements

Refactor the chat template of supported models using factory pattern by @lvhan028 in #144 and @streamsunshine in #174
Add profile throughput benchmark by @grimoire in #146
Remove slicing reponse and add resume api by @streamsunshine in #154
Support DeepSpeed on autoTP and kernel injection by @KevinNuNu and @wangruohui in #138
Add github action for publishing docker image by @RunningLeon in #148

🐞 Bug fixes

Fix getting package root path error in python3.9 by @lvhan028 in #157
Return carriage caused overwriting at the same line by @wangruohui in #143
Fix the offset during streaming chat by @lvhan028 in #142
Fix concatenate bug in benchmark serving script by @rollroll90 in #134
Fix attempted_relative_import by @KevinNuNu in #125

📚 Documentations

Translate en/quantization.md into Chinese by @xin-li-67 in #166
Check-in benchmark on real conversation data by @lvhan028 in #156
Fix typo and missing dependant packages in REAME and requirements.txt by @vansin in #123, @APX103 in #109, @AllentDan in #119 and @del-zhenwu in #124
Add turbomind's architecture documentation by @lzhangzz in #101

New Contributors

@streamsunshine @del-zhenwu @APX103 @xin-li-67 @KevinNuNu @rollroll90

Contributors

grimoire, lvhan028, and 13 other contributors

Assets 6

Releases: InternLM/lmdeploy

LMDeploy Release V0.0.10

What's Changed

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

New Contributors

Contributors

LMDeploy Release V0.0.9

Highlight

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

🌐 Other

New Contributors

Contributors

LMDeploy Release V0.0.8

Highlight

What's Changed

🚀 Features

🐞 Bug fixes

📚 Documentations

New Contributors

Contributors

LMDeploy Release V0.0.7

Highlights

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations

Contributors

LMDeploy Release V0.0.6

Highlights

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

Known issues

Contributors

LMDeploy Release V0.0.5

What's Changed

🐞 Bug fixes

Contributors

LMDeploy Release V0.0.4

Highlight

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations

Contributors

LMDeploy Release V0.0.3

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations

New Contributors

Contributors

LMDeploy Release V0.0.2

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations

New Contributors

Contributors