GTPQ量化Qwen2.5-32B-Instruct LORA微调后的版本报错:torch._C._LinAlgError: linalg.cholesky: The factorization could not be completed because the input is not positive-definite (the leading minor of order 25942 is not positive-definite). #6057
Labels
pending
This problem is yet to be addressed
Reminder
System Info
llamafactory
version: 0.9.1.dev0Reproduction
model
model_name_or_path: models/qwen25_32B_lora_sft
template: qwen
export
export_dir: models/Qwen2.5-32B-Instruct-GPTQ-Int4
export_quantization_bit: 4
export_quantization_dataset: data/c4_demo.json
export_size: 2
export_device: cpu
export_legacy_format: false
Expected behavior
在对Qwen2.5-32B-Instruct进行LORA微调并合并以后,再进行4Bit量化,过程中报错:
Quantizing model.layers blocks : 20%|████████▎ | 13/64 [26:54<1:45:34, 124.20s/it]
_Traceback (most recent call last):
File "/root/miniconda3/envs/xtuner-env/bin/llamafactory-cli", line 8, in
sys.exit(main())
File "/root/autodl-tmp/LLaMA-Factory-main/src/llamafactory/cli.py", line 87, in main
export_model()
File "/root/autodl-tmp/LLaMA-Factory-main/src/llamafactory/train/tuner.py", line 76, in export_model
model = load_model(tokenizer, model_args, finetuning_args) # must after fixing tokenizer to resize vocab
File "/root/autodl-tmp/LLaMA-Factory-main/src/llamafactory/model/loader.py", line 156, in load_model
model = load_class.from_pretrained(**init_kwargs)
File "/root/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
return model_class.from_pretrained(
File "/root/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4003, in from_pretrained
hf_quantizer.postprocess_model(model)
File "/root/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/transformers/quantizers/base.py", line 195, in postprocess_model
return self._process_model_after_weight_loading(model, **kwargs)
File "/root/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/transformers/quantizers/quantizer_gptq.py", line 85, in _process_model_after_weight_loading
self.optimum_quantizer.quantize_model(model, self.quantization_config.tokenizer)
File "/root/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/root/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/optimum/gptq/quantizer.py", line 510, in quantize_model
scale, zero, g_idx = gptq[name].fasterquant(
File "/root/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/auto_gptq/quantization/gptq.py", line 118, in fasterquant
H = torch.linalg.cholesky(H, upper=True)
torch._C.LinAlgError: linalg.cholesky: The factorization could not be completed because the input is not positive-definite (the leading minor of order 25942 is not positive-definite).
(xtuner-env) root@autodl-container-824b40a131-f3c15f0d:~/autodl-tmp/LLaMA-Factory-main#
经过搜索网络上相关报错信息:说的是校正数据样本数不足了。。。,但同样的LLamaFactory的工程版本 0.9.1.dev0,同样的微调脚本和合并脚本,我其实在恒源云的平台上运行过,是可以正确的量化出来的,这次只是迁移了一个平台,不知道为啥会报这样的一个错误。
PS:在了解到校正数据样本数不足的这个问题后,我尝试在quantization.py进行了这样一个修改:
...
#debug for quantization
maxlen = model_args.export_quantization_maxlen
model_args.export_quantization_nsamples = 512
logger.info("==========================================================================")
logger.info(f"maxlen: {maxlen}")
logger.info(f"nsamples: {model_args.export_quantization_nsamples}")
....
希望通过这个方式将校正样本数量增加到512(校正数据集换成了自己的、带有512条中文数据集),但是依然会报同样的错误。
我的问题是:
https://github.com/AutoGPTQ/AutoGPTQ/pull/111#issuecomment-1624014218
quantize_config = BaseQuantizeConfig(
bits=4,
group_size=128,
desc_act=False,
damp_percent=0.1
)
但本工程应该不是通过这个进行设置的,作者大佬@hiyouga能指点一下如果要设置参数damp_percent=0.1应该在哪儿进行么?
Others
No response
The text was updated successfully, but these errors were encountered: