bump version to v0.6.0a0 (#2371)

* bump version to v0.6.0a0 * miss one doc * update w4a16.md
InternLM · Aug 26, 2024 · 97b880b · 97b880b
1 parent 91f6cdf
commit 97b880b
Show file tree

Hide file tree

Showing 7 changed files with 21 additions and 27 deletions.
diff --git a/docs/en/installation.md b/docs/en/installation.md
@@ -23,7 +23,7 @@ pip install lmdeploy
 The default prebuilt package is compiled on **CUDA 12**. If CUDA 11+ (>=11.3) is required, you can install lmdeploy by:
 
 ```shell
-export LMDEPLOY_VERSION=0.5.3
+export LMDEPLOY_VERSION=0.6.0a0
 export PYTHON_VERSION=38
 pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
 ```

diff --git a/docs/en/multi_modal/minicpmv.md b/docs/en/multi_modal/minicpmv.md
@@ -153,7 +153,7 @@ docker run --runtime nvidia --gpus all \
     --env "HUGGING_FACE_HUB_TOKEN=<secret>" \
     -p 23333:23333 \
     --ipc=host \
-    openmmlab/lmdeploy:v0.5.3-cu12 \
+    openmmlab/lmdeploy:latest \
     lmdeploy serve api_server openbmb/MiniCPM-V-2_6
 ```
 
@@ -165,7 +165,7 @@ version: '3.5'
 services:
   lmdeploy:
     container_name: lmdeploy
-    image: openmmlab/lmdeploy:v0.5.3-cu12
+    image: openmmlab/lmdeploy:latest
     ports:
       - "23333:23333"
     environment:

diff --git a/docs/en/quantization/w4a16.md b/docs/en/quantization/w4a16.md
@@ -1,22 +1,17 @@
-# AWQ
+# AWQ/GPTQ
 
-LMDeploy adopts [AWQ](https://arxiv.org/abs/2306.00978) algorithm for 4bit weight-only quantization. By developed the high-performance cuda kernel, the 4bit quantized model inference achieves up to 2.4x faster than FP16.
+LMDeploy TurboMind engine supports the inference of 4bit quantized models that are quantized both by [AWQ](https://arxiv.org/abs/2306.00978) and [GPTQ](https://github.com/AutoGPTQ/AutoGPTQ), but its quantization module only supports the AWQ quantization algorithm.
 
-LMDeploy supports the following NVIDIA GPU for W4A16 inference:
+The following NVIDIA GPUs are available for AWQ/GPTQ INT4 inference:
 
+- V100(sm70): V100
 - Turing(sm75): 20 series, T4
-
 - Ampere(sm80,sm86): 30 series, A10, A16, A30, A100
-
 - Ada Lovelace(sm89): 40 series
 
-Before proceeding with the quantization and inference, please ensure that lmdeploy is installed.
-
-```shell
-pip install lmdeploy[all]
-```
+Before proceeding with the quantization and inference, please ensure that lmdeploy is installed by following the [installation guide](../installation.md)
 
-This article comprises the following sections:
+The remainder of this article is structured into the following sections:
 
 <!-- toc -->
 

diff --git a/docs/zh_cn/installation.md b/docs/zh_cn/installation.md
@@ -23,7 +23,7 @@ pip install lmdeploy
 默认的预构建包是在 **CUDA 12** 上编译的。如果需要 CUDA 11+ (>=11.3)，你可以使用以下命令安装 lmdeploy：
 
 ```shell
-export LMDEPLOY_VERSION=0.5.3
+export LMDEPLOY_VERSION=0.6.0a0
 export PYTHON_VERSION=38
 pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
 ```

diff --git a/docs/zh_cn/multi_modal/minicpmv.md b/docs/zh_cn/multi_modal/minicpmv.md
@@ -153,7 +153,7 @@ docker run --runtime nvidia --gpus all \
     --env "HUGGING_FACE_HUB_TOKEN=<secret>" \
     -p 23333:23333 \
     --ipc=host \
-    openmmlab/lmdeploy:v0.5.3-cu12 \
+    openmmlab/lmdeploy:latest \
     lmdeploy serve api_server openbmb/MiniCPM-V-2_6
 ```
 
@@ -165,7 +165,7 @@ version: '3.5'
 services:
   lmdeploy:
     container_name: lmdeploy
-    image: openmmlab/lmdeploy:v0.5.3-cu12
+    image: openmmlab/lmdeploy:latest
     ports:
       - "23333:23333"
     environment:

diff --git a/docs/zh_cn/quantization/w4a16.md b/docs/zh_cn/quantization/w4a16.md
@@ -1,18 +1,17 @@
 # INT4 模型量化和部署
 
-LMDeploy 使用 AWQ 算法，实现模型 4bit 权重量化。推理引擎 TurboMind 提供了非常高效的 4bit 推理 cuda kernel，性能是 FP16 的 2.4 倍以上。它支持以下 NVIDIA 显卡：
+LMDeploy TurboMind 引擎支持由 [AWQ](https://arxiv.org/abs/2306.00978) 和 [GPTQ](https://github.com/AutoGPTQ/AutoGPTQ) 两种量化方法量化的 4bit 模型的推理。然而，LMDeploy 量化模块目前仅支持 AWQ 量化算法。
 
-- 图灵架构（sm75）：20系列、T4
-- 安培架构（sm80,sm86）：30系列、A10、A16、A30、A100
-- Ada Lovelace架构（sm89）：40 系列
+可用于 AWQ/GPTQ INT4 推理的 NVIDIA GPU 包括：
 
-在量化和部署之前，请确保安装了 lmdeploy.
+- V100(sm70): V100
+- Turing(sm75): 20 系列，T4
+- Ampere(sm80,sm86): 30 系列，A10, A16, A30, A100
+- Ada Lovelace(sm89): 40 系列
 
-```shell
-pip install lmdeploy[all]
-```
+在进行量化和推理之前，请确保按照[安装指南](../installation.md)安装了 lmdeploy。
 
-本文由以下章节组成：
+本文的其余部分由以下章节组成：
 
 <!-- toc -->
 

diff --git a/lmdeploy/version.py b/lmdeploy/version.py
@@ -1,7 +1,7 @@
 # Copyright (c) OpenMMLab. All rights reserved.
 from typing import Tuple
 
-__version__ = '0.5.3'
+__version__ = '0.6.0a0'
 short_version = __version__