GTPQ量化Qwen2.5-32B-Instruct LORA微调后的版本报错：torch._C._LinAlgError: linalg.cholesky: The factorization could not be completed because the input is not positive-definite (the leading minor of order 25942 is not positive-definite). #6057

camposs1979 · 2024-11-18T03:26:45Z

Reminder

I have read the README and searched the existing issues.

System Info

llamafactory version: 0.9.1.dev0
Platform: Linux-5.15.0-47-generic-x86_64-with-glibc2.35
Python version: 3.10.15
PyTorch version: 2.4.1+cu121
Transformers version: 4.43.4
Datasets version: 2.21.0
Accelerate version: 0.34.2
PEFT version: 0.12.0
TRL version: 0.9.6
DeepSpeed version: 0.14.5
Bitsandbytes version: 0.44.1
vLLM version: 0.4.3

Reproduction

model

model_name_or_path: models/qwen25_32B_lora_sft
template: qwen

export

export_dir: models/Qwen2.5-32B-Instruct-GPTQ-Int4
export_quantization_bit: 4
export_quantization_dataset: data/c4_demo.json
export_size: 2
export_device: cpu
export_legacy_format: false

Expected behavior

在对Qwen2.5-32B-Instruct进行LORA微调并合并以后，再进行4Bit量化，过程中报错：
Quantizing model.layers blocks : 20%|████████▎ | 13/64 [26:54<1:45:34, 124.20s/it]
_Traceback (most recent call last):
File "/root/miniconda3/envs/xtuner-env/bin/llamafactory-cli", line 8, in
sys.exit(main())
File "/root/autodl-tmp/LLaMA-Factory-main/src/llamafactory/cli.py", line 87, in main
export_model()
File "/root/autodl-tmp/LLaMA-Factory-main/src/llamafactory/train/tuner.py", line 76, in export_model
model = load_model(tokenizer, model_args, finetuning_args) # must after fixing tokenizer to resize vocab
File "/root/autodl-tmp/LLaMA-Factory-main/src/llamafactory/model/loader.py", line 156, in load_model
model = load_class.from_pretrained(**init_kwargs)
File "/root/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
return model_class.from_pretrained(
File "/root/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4003, in from_pretrained
hf_quantizer.postprocess_model(model)
File "/root/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/transformers/quantizers/base.py", line 195, in postprocess_model
return self._process_model_after_weight_loading(model, **kwargs)
File "/root/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/transformers/quantizers/quantizer_gptq.py", line 85, in _process_model_after_weight_loading
self.optimum_quantizer.quantize_model(model, self.quantization_config.tokenizer)
File "/root/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/root/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/optimum/gptq/quantizer.py", line 510, in quantize_model
scale, zero, g_idx = gptq[name].fasterquant(
File "/root/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/auto_gptq/quantization/gptq.py", line 118, in fasterquant
H = torch.linalg.cholesky(H, upper=True)
torch._C.LinAlgError: linalg.cholesky: The factorization could not be completed because the input is not positive-definite (the leading minor of order 25942 is not positive-definite).
(xtuner-env) root@autodl-container-824b40a131-f3c15f0d:~/autodl-tmp/LLaMA-Factory-main#

经过搜索网络上相关报错信息：说的是校正数据样本数不足了。。。，但同样的LLamaFactory的工程版本 0.9.1.dev0，同样的微调脚本和合并脚本，我其实在恒源云的平台上运行过，是可以正确的量化出来的，这次只是迁移了一个平台，不知道为啥会报这样的一个错误。
PS：在了解到校正数据样本数不足的这个问题后，我尝试在quantization.py进行了这样一个修改：
...
#debug for quantization
maxlen = model_args.export_quantization_maxlen
model_args.export_quantization_nsamples = 512
logger.info("==========================================================================")
logger.info(f"maxlen: {maxlen}")
logger.info(f"nsamples: {model_args.export_quantization_nsamples}")
....
希望通过这个方式将校正样本数量增加到512（校正数据集换成了自己的、带有512条中文数据集），但是依然会报同样的错误。
我的问题是：

为什么同样的工程版本，数据集，训练脚本，不一样的平台（只是租赁云平台不一样，其他的PIP组件我都保持了一致），会报这种错误。
校正样本数量是解决这个问题是否是有帮助的？我看网上还有另一种办法：通过对damp_percent设置为0.1来解决
https://github.com/AutoGPTQ/AutoGPTQ/pull/111#issuecomment-1624014218
quantize_config = BaseQuantizeConfig(
bits=4,
group_size=128,
desc_act=False,
damp_percent=0.1
)
但本工程应该不是通过这个进行设置的，作者大佬@hiyouga能指点一下如果要设置参数damp_percent=0.1应该在哪儿进行么？

Others

No response

The text was updated successfully, but these errors were encountered:

github-actions bot added the pending This problem is yet to be addressed label Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GTPQ量化Qwen2.5-32B-Instruct LORA微调后的版本报错：torch._C._LinAlgError: linalg.cholesky: The factorization could not be completed because the input is not positive-definite (the leading minor of order 25942 is not positive-definite). #6057

GTPQ量化Qwen2.5-32B-Instruct LORA微调后的版本报错：torch._C._LinAlgError: linalg.cholesky: The factorization could not be completed because the input is not positive-definite (the leading minor of order 25942 is not positive-definite). #6057

camposs1979 commented Nov 18, 2024

GTPQ量化Qwen2.5-32B-Instruct LORA微调后的版本报错：torch._C._LinAlgError: linalg.cholesky: The factorization could not be completed because the input is not positive-definite (the leading minor of order 25942 is not positive-definite). #6057

GTPQ量化Qwen2.5-32B-Instruct LORA微调后的版本报错：torch._C._LinAlgError: linalg.cholesky: The factorization could not be completed because the input is not positive-definite (the leading minor of order 25942 is not positive-definite). #6057

Comments

camposs1979 commented Nov 18, 2024

Reminder

System Info

Reproduction

model

export

Expected behavior

Others