Releases · InternLM/lmdeploy

19 Jan 10:38

lvhan028

v0.2.1

e96e2b4

LMDeploy Release V0.2.1

What's Changed

💥 Improvements

[Fix] interlm2 chat format by @Harold-lkk in #1002

🐞 Bug fixes

fix baichuan2 conversion by @AllentDan in #972
[Fix] interlm messages2prompt by @Harold-lkk in #1003

📚 Documentations

add guide about installation on cuda 12+ platform by @lvhan028 in #988

🌐 Other

bump version to v0.2.1 by @lvhan028 in #1005

Full Changelog: v0.2.0...v0.2.1

Contributors

lvhan028, Harold-lkk, and AllentDan

Assets 10

17 Jan 02:00

lvhan028

v0.2.0

b319dce

LMDeploy Release V0.2.0

What's Changed

🚀 Features

Support internlm2 by @lvhan028 in #963
[Feature] Add params config for api server web_ui by @amulil in #735
[Feature]Merge lmdeploy lite calibrate and lmdeploy lite auto_awq by @pppppM in #849
Compute cross entropy loss given a list of input tokens by @lvhan028 in #830
Support QoS in api_server by @sallyjunjun in #877
Refactor torch inference engine by @lvhan028 in #871
add image chat demo by @irexyc in #874
check-in generation config by @lvhan028 in #902
check-in ModelConfig by @AllentDan in #907
pytorch engine config by @grimoire in #908
Check-in turbomind engine config by @irexyc in #909
S-LoRA support by @grimoire in #894
add init in adapters by @grimoire in #923
Refactor LLM inference pipeline API by @AllentDan in #916
Refactor gradio and api_server by @AllentDan in #918
Add request distributor server by @AllentDan in #903
Upgrade lmdeploy cli by @RunningLeon in #922

💥 Improvements

add top_k value for /v1/completions and update the documents by @AllentDan in #870
export "num_tokens_per_iter", "max_prefill_iters" and etc when converting a model by @lvhan028 in #845
Move api_server dependencies from serve.txt to runtime.txt by @lvhan028 in #879
Refactor benchmark bash script by @lvhan028 in #884
Add test case for function regression by @zhulinJulia24 in #844
Update test triton CI by @RunningLeon in #893
Update dockerfile by @RunningLeon in #891
Perform fuzzy matching on chat template according to model path by @AllentDan in #839
support accessing lmdeploy version by lmdeploy.version_info by @lvhan028 in #910
Remove flash-attn dependency of lmdeploy lite module by @lvhan028 in #917
Improve setup by removing pycuda dependency and adding cuda runtime and cublas to RPATH by @irexyc in #912
remove unused settings in turbomind engine config by @irexyc in #921
Cleanup fixed attributes in turbomind engine config by @irexyc in #928
fix get_gpu_mem by @grimoire in #934
remove instance_num argument by @AllentDan in #931
Fix matching results of several chat templates like llama2, solar, yi and so on by @AllentDan in #925
add pytorch random sampling by @grimoire in #930
suppress turbomind chat warning by @irexyc in #937
modify type hint of api to avoid import _turbomind by @AllentDan in #936
accelerate pytorch benchmark by @grimoire in #946
Remove tp from pipline argument list by @lvhan028 in #947
set gradio default value the same as chat.py by @AllentDan in #949
print help for cli in case of failure by @RunningLeon in #955
return dataclass for pipeline by @AllentDan in #952
set random seed when it is None by @AllentDan in #958
avoid run get_logger when import lmdeploy by @RunningLeon in #956
support mlp s-lora by @grimoire in #957
skip resume logic for pytorch backend by @AllentDan in #968
Add ci for ut by @RunningLeon in #966

🐞 Bug fixes

add tritonclient req by @RunningLeon in #872
Fix uninitialized parameter by @lvhan028 in #875
Fix overflow by @irexyc in #897
Fix data offset by @AllentDan in #900
Fix context decoding stuck issue when tp > 1 by @irexyc in #904
[Fix] set scaling_factor 1 forcefully when sequence length is less than max_pos_emb by @lvhan028 in #911
fix pytorch llama2 with new transformers by @grimoire in #914
fix local variable 'output_ids' referenced before assignment by @irexyc in #919
fix pipeline stop_words type error by @AllentDan in #929
pass stop words to openai api by @AllentDan in #887
fix profile generation multiprocessing error by @AllentDan in #933
Miss init.py in modeling folder by @lvhan028 in #951
fix cli with special arg names by @RunningLeon in #959
fix logger in tokenizer by @RunningLeon in #960

📚 Documentations

Improve user guide by @lvhan028 in #899
Add user guide about pytorch engine by @grimoire in #915
Update supported models and add quick start section in README by @lvhan028 in #926
Fix scripts in benchmark doc by @panli889 in #941
Update get_started and w4a16 tutorials by @lvhan028 in #945
Add more docstring to api_server and proxy_server by @AllentDan in #965
stable api_server benchmark result by a non-zero await by @AllentDan in #885
fix pytorch backend can not properly stop by @AllentDan in #962
[Fix] Fix calibrate bug when transformers>4.36 by @pppppM in #967

🌐 Other

bump version to v0.2.0 by @lvhan028 in #969

New Contributors

@amulil made their first contribution in #735
@zhulinJulia24 made their first contribution in #844
@sallyjunjun made their first contribution in #877
@panli889 made their first contribution in #941

Full Changelog: v0.1.0...v0.2.0

Contributors

grimoire, panli889, and 8 other contributors

Assets 10

18 Dec 12:10

lvhan028

v0.1.0

477f2db

LMDeploy Release V0.1.0

What's Changed

🚀 Features

Add extra_requires to reduce dependencies by @RunningLeon in #580
TurboMind 2 by @lzhangzz in #590
Support loading hf model directly by @irexyc in #685
convert model with hf repo_id by @irexyc in #774
Support turbomind bf16 by @grimoire in #803
support image_embs input by @irexyc in #799
Add api.py by @AllentDan in #805

💥 Improvements

Fix Tokenizer encode by @AllentDan in #645
Optimize for throughput by @lzhangzz in #701
Replace mmengine with mmengine-lite by @zhouzaida in #715
Set the default value of max_context_token_num 1 by @lvhan028 in #761
add triton server test and workflow yml by @RunningLeon in #760
improvement(build): enable ninja and gold linker by @tpoisonooo in #767
Report first-token-latency and token-latency percentiles by @lvhan028 in #736
Unify prefill & decode passes by @lzhangzz in #775
add cuda12.1 build check ci by @irexyc in #782
auto upload cuda12.1 python pkg to release when create new tag by @irexyc in #784
Report the inference benchmark of models with different size by @lvhan028 in #794
Simplify block manager by @lzhangzz in #812
Disable attention mask when it is not needed by @lzhangzz in #813
FIFO pipe strategy for api_server by @AllentDan in #795
simplify the header of the benchmark table by @lvhan028 in #820
add encode for opencompass by @AllentDan in #828
fix: awq should save bin files by @hscspring in #793
Support building docker image manually in CI by @RunningLeon in #825

🐞 Bug fixes

Fix init of batch state by @lzhangzz in #682
fix turbomind stream canceling by @grimoire in #686
[Fix] Fix load_checkpoint_in_model bug by @HIT-cwh in #690
Fix wrong eos_id and bos_id obtained through grpc api by @lvhan028 in #644
Fix cache/output length calculation by @lzhangzz in #738
[Fix] Skip empty batch by @lzhangzz in #747
[Fix] build docker image failed since packaging is missing by @lvhan028 in #753
[Fix] Rollback the data type of input_ids to TYPE_UINT32 in preprocessor's proto by @lvhan028 in #758
fix turbomind build on sm<80 by @grimoire in #754
Fix early-exit condition in attention kernel by @lzhangzz in #788
Fix missed arguments when benchmark static inference performance by @lvhan028 in #787
fix extra colon in InternLMChat7B template by @C1rN09 in #796
Fix local kv head num by @lvhan028 in #806
Fix out-of-bound access by @lzhangzz in #809
Set smem size for repetition penalty kernel by @lzhangzz in #818
Fix cache verification by @lzhangzz in #821
fix finish_reason by @AllentDan in #816
fix turbomind awq by @grimoire in #847
Fix stop requests by await before turbomind queue.get() by @AllentDan in #850
[Fix] Fix meta tensor error by @pppppM in #848
Fix cuda reinitialization in a multiprocessing setting by @grimoire in #862
launch gradio server directly with hf model by @AllentDan in #856
fix typo by @grimoire in #769
Add chat template for Yi by @AllentDan in #779
fix api_server stop_session and end_session by @AllentDan in #835
Return the iterator after erasing it from a map by @irexyc in #864

📚 Documentations

[Docs] Update Supported Matrix by @pppppM in #679
[Docs] Update KV8 Docs by @pppppM in #681
[Doc] Update restful api doc by @AllentDan in #662
Check-in user guide about turbomind config by @lvhan028 in #680
Update benchmark user guide by @lvhan028 in #763
[Docs] Fix typo in restful_api user guide by @maxchiron in #858
[Docs] Fix typo in restful_api user guide by @maxchiron in #859

🌐 Other

bump version to v0.1.0a0 by @lvhan028 in #709
bump version to 0.1.0a1 by @lvhan028 in #776
bump version to v0.1.0a2 by @lvhan028 in #807
bump version to v0.1.0 by @lvhan028 in #834

New Contributors

@zhouzaida made their first contribution in #715
@C1rN09 made their first contribution in #796
@maxchiron made their first contribution in #858

Full Changelog: v0.0.14...v0.1.0

Contributors

grimoire, lvhan028, and 11 other contributors

Assets 10

06 Dec 06:50

lvhan028

v0.1.0a2

fddad30

LMDeploy Release V0.1.0a2

What's Changed

💥 Improvements

Unify prefill & decode passes by @lzhangzz in #775
add cuda12.1 build check ci by @irexyc in #782
auto upload cuda12.1 python pkg to release when create new tag by @irexyc in #784
Report the inference benchmark of models with different size by @lvhan028 in #794
Add chat template for Yi by @AllentDan in #779

🐞 Bug fixes

Fix early-exit condition in attention kernel by @lzhangzz in #788
Fix missed arguments when benchmark static inference performance by @lvhan028 in #787
fix extra colon in InternLMChat7B template by @C1rN09 in #796
Fix local kv head num by @lvhan028 in #806

📚 Documentations

Update benchmark user guide by @lvhan028 in #763

🌐 Other

bump version to v0.1.0a2 by @lvhan028 in #807

New Contributors

@C1rN09 made their first contribution in #796

Full Changelog: v0.1.0a1...v0.1.0a2

Contributors

lvhan028, irexyc, and 3 other contributors

Assets 10

29 Nov 13:51

lvhan028

v0.1.0a1

9c46b27

LMDeploy Release V0.1.0a1

What's Changed

💥 Improvements

Set the default value of max_context_token_num 1 by @lvhan028 in #761
add triton server test and workflow yml by @RunningLeon in #760
improvement(build): enable ninja and gold linker by @tpoisonooo in #767
Report first-token-latency and token-latency percentiles by @lvhan028 in #736
convert model with hf repo_id by @irexyc in #774

🐞 Bug fixes

[Fix] build docker image failed since packaging is missing by @lvhan028 in #753
[Fix] Rollback the data type of input_ids to TYPE_UINT32 in preprocessor's proto by @lvhan028 in #758
fix turbomind build on sm<80 by @grimoire in #754
fix typo by @grimoire in #769

🌐 Other

bump version to 0.1.0a1 by @lvhan028 in #776

Full Changelog: v0.1.0a0...v0.1.0a1

Contributors

grimoire, lvhan028, and 3 other contributors

Assets 2

23 Nov 13:05

lvhan028

v0.1.0a0

a7c5007

LMDeploy Release V0.1.0a0

What's Changed

🚀 Features

Add extra_requires to reduce dependencies by @RunningLeon in #580
TurboMind 2 by @lzhangzz in #590
Support loading hf model directly by @irexyc in #685

💥 Improvements

Fix Tokenizer encode by @AllentDan in #645
Optimize for throughput by @lzhangzz in #701
Replace mmengine with mmengine-lite by @zhouzaida in #715

🐞 Bug fixes

Fix init of batch state by @lzhangzz in #682
fix turbomind stream canceling by @grimoire in #686
[Fix] Fix load_checkpoint_in_model bug by @HIT-cwh in #690
Fix wrong eos_id and bos_id obtained through grpc api by @lvhan028 in #644
Fix cache/output length calculation by @lzhangzz in #738
[Fix] Skip empty batch by @lzhangzz in #747

📚 Documentations

[Docs] Update Supported Matrix by @pppppM in #679
[Docs] Update KV8 Docs by @pppppM in #681
[Doc] Update restful api doc by @AllentDan in #662
Check-in user guide about turbomind config by @lvhan028 in #680

🌐 Other

bump version to v0.1.0a0 by @lvhan028 in #709

New Contributors

@zhouzaida made their first contribution in #715

Full Changelog: v0.0.14...v0.1.0a0

Contributors

grimoire, lvhan028, and 7 other contributors

Assets 2

09 Nov 12:13

lvhan028

v0.0.14

7b20cfd

LMDeploy Release V0.0.14

What's Changed

💥 Improvements

Improve api_server and webui usage by @AllentDan in #544
fix: gradio gr.Button.update deprecated after 4.0.0 by @hscspring in #637
add cli to list the supported model names by @RunningLeon in #639
Refactor model conversion by @irexyc in #296
[Enchance] internlm message to prompt by @Harold-lkk in #499
update turbomind session_len with model.session_len by @AllentDan in #634
Manage session id using random int for gradio local mode by @aisensiy in #553
Add UltraCM and WizardLM chat templates by @AllentDan in #599
Add check env sub command by @RunningLeon in #654

🐞 Bug fixes

[Fix] Qwen's quantization results are abnormal & Baichuan cannot be quantized by @pppppM in #605
FIX: fix stop_session func bug by @yunzhongyan0 in #578
fix benchmark serving computation mistake by @AllentDan in #630
fix Tokenizer load error when the path of the being-converted model is not writable by @irexyc in #669
fix tokenizer_info when convert the model by @irexyc in #661

🌐 Other

bump version to v0.0.14 by @lvhan028 in #663

New Contributors

@hscspring made their first contribution in #637
@yunzhongyan0 made their first contribution in #578

Full Changelog: v0.0.13...v0.0.14

Contributors

aisensiy, lvhan028, and 7 other contributors

Assets 2

30 Oct 06:35

lvhan028

v0.0.13

56942c4

LMDeploy Release V0.0.13

What's Changed

🚀 Features

Add more user-friendly CLI by @RunningLeon in #541

💥 Improvements

support inference a batch of prompts by @AllentDan in #467

📚 Documentations

Add "build from docker" section by @lvhan028 in #602

🌐 Other

bump version to v0.0.13 by @lvhan028 in #620

Full Changelog: v0.0.12...v0.0.13

Contributors

lvhan028, RunningLeon, and AllentDan

Assets 2

24 Oct 04:23

lvhan028

v0.0.12

96f1b8e

LMDeploy Release V0.0.12

What's Changed

🚀 Features

add solar chat template by @AllentDan in #576 and #587

💥 Improvements

change model_format to qwen when model_name starts with qwen by @lvhan028 in #575
robust incremental decode for leading space by @AllentDan in #581

🐞 Bug fixes

avoid splitting chinese characters during decoding by @AllentDan in #566
Revert "[Docs] Simplify build.md" by @pppppM in #586
Fix crash and remove sys_instruct from chat.py and client.py by @irexyc in #591

🌐 Other

bump version to v0.0.12 by @lvhan028 in #604

Full Changelog: v0.0.11...v0.0.12

Contributors

lvhan028, irexyc, and 2 other contributors

Assets 2

17 Oct 06:19

lvhan028

v0.0.11

bb3cce9

LMDeploy Release V0.0.11

What's Changed

🚀 Features

Support CORS for openai api server by @aisensiy in #481

💥 Improvements

make IPv6 compatible, safe run for coroutine interrupting by @AllentDan in #487
support deploy qwen-14b-chat by @irexyc in #482
add tp hint for deployment by @irexyc in #555
Move tokenizer.py to the folder of lmdeploy by @grimoire in #543

🐞 Bug fixes

Change shared_instance type from weakptr to shared_ptr by @lvhan028 in #507
[Fix] Set the default value of step being 0 by @lvhan028 in #532
[bug] fix mismatched shape for decoder output tensor by @akhoroshev in #517
Fix typing of openai protocol. by @mokeyish in #554

📚 Documentations

Fix typo in docs/en/pytorch.md by @shahrukhx01 in #539
[Doc] update huggingface internlm-chat-7b model url by @AllentDan in #546
[doc] Update benchmark command in w4a16.md by @del-zhenwu in #500

🌐 Other

free runner disk by @irexyc in #552
bump version to v0.0.11 by @lvhan028 in #567

New Contributors

@shahrukhx01 made their first contribution in #539
@mokeyish made their first contribution in #554

Full Changelog: v0.0.10...v0.0.11

Contributors

aisensiy, grimoire, and 7 other contributors

Assets 2

Releases: InternLM/lmdeploy

LMDeploy Release V0.2.1

What's Changed

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

Contributors

LMDeploy Release V0.2.0

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

New Contributors

Contributors

LMDeploy Release V0.1.0

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

New Contributors

Contributors

LMDeploy Release V0.1.0a2

What's Changed

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

New Contributors

Contributors

LMDeploy Release V0.1.0a1

What's Changed

💥 Improvements

🐞 Bug fixes

🌐 Other

Contributors

LMDeploy Release V0.1.0a0

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

New Contributors

Contributors

LMDeploy Release V0.0.14

What's Changed

💥 Improvements

🐞 Bug fixes

🌐 Other

New Contributors

Contributors

LMDeploy Release V0.0.13

What's Changed

🚀 Features

💥 Improvements

📚 Documentations

🌐 Other

Contributors

LMDeploy Release V0.0.12

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

🌐 Other

Contributors

LMDeploy Release V0.0.11

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

New Contributors

Contributors