Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About ValueError: #59

Open
LWZWTWLWZ opened this issue Jun 12, 2024 · 0 comments
Open

About ValueError: #59

LWZWTWLWZ opened this issue Jun 12, 2024 · 0 comments

Comments

@LWZWTWLWZ
Copy link

LWZWTWLWZ commented Jun 12, 2024

尊敬的作者您好,我在复现您的代码的过程中出现了如下问题,我是将llama2-7b-hf下载到本地通过本地调用实现代码的运行。在运行过程中出现了中断,显示输入有NaN。请问您有好的解决方案吗?
(Tallrec) ubuntu@ubuntu:~/0522231063/Githubfuxian/TALLRec/TALLRec-main$ bash ./shell/instruct_7B.sh 1 42
1, 42
lr: 1e-4, dropout: 0.05 , seed: 42, sample: 64
Training Alpaca-LoRA model with params:
base_model: /home/ubuntu/llama2-7b-hf
train_data_path: /home/ubuntu/0522231063/Githubfuxian/TALLRec/TALLRec-main/data/book/train.json
val_data_path: /home/ubuntu/0522231063/Githubfuxian/TALLRec/TALLRec-main/data/book/valid.json
sample: 64
seed: 42
output_dir: /home/ubuntu/0522231063/Githubfuxian/TALLRec/TALLRec-main/model_42_64
batch_size: 128
micro_batch_size: 32
num_epochs: 200
learning_rate: 0.0001
cutoff_len: 512
lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules: ['q_proj', 'v_proj']
train_on_inputs: True
group_by_length: True
wandb_project:
wandb_run_name:
wandb_watch:
wandb_log_model:
resume_from_checkpoint: /home/ubuntu/0522231063/Githubfuxian/TALLRec/TALLRec-main/alpaca-lora-7B/adapter_config.json

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:04<00:00, 2.23s/it]
Checkpoint /home/ubuntu/0522231063/Githubfuxian/TALLRec/TALLRec-main/alpaca-lora-7B/adapter_config.json/adapter_model.bin not found
trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199
Map: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [00:00<00:00, 1383.30 examples/s]
Map: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2427/2427 [00:01<00:00, 1593.21 examples/s]
Using the WANDB_DISABLED environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
0%| | 0/200 [00:00<?, ?it/s]/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/torch/utils/checkpoint.py:464: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
warnings.warn(
{'loss': 1.0435, 'learning_rate': 4e-05, 'epoch': 8.0}
{'eval_loss': 2.0803046226501465, 'eval_auc': 0.499365194424907, 'eval_runtime': 162.0148, 'eval_samples_per_second': 14.98, 'eval_steps_per_second': 1.876, 'epoch': 10.0}
5%|███████▊ | 10/200 [05:12<47:31, 15.01s/it]
/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/torch/utils/checkpoint.py:464: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
warnings.warn(
{'loss': 1.0116, 'learning_rate': 8e-05, 'epoch': 16.0}
{'eval_loss': 1.836759328842163, 'eval_auc': 0.5092683767063315, 'eval_runtime': 162.8768, 'eval_samples_per_second': 14.901, 'eval_steps_per_second': 1.866, 'epoch': 20.0}
10%|███████████████▌ | 20/200 [10:25<51:03, 17.02s/it]
/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/torch/utils/checkpoint.py:464: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
warnings.warn(
{'loss': 0.9067, 'learning_rate': 9.777777777777778e-05, 'epoch': 24.0}
{'eval_loss': 1.3795101642608643, 'eval_auc': 0.5749956938005082, 'eval_runtime': 162.5594, 'eval_samples_per_second': 14.93, 'eval_steps_per_second': 1.87, 'epoch': 30.0}
15%|███████████████████████▍ | 30/200 [15:38<48:25, 17.09s/it]
/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/torch/utils/checkpoint.py:464: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
warnings.warn(
{'loss': 0.7264, 'learning_rate': 9.333333333333334e-05, 'epoch': 32.0}
{'loss': 0.5474, 'learning_rate': 8.888888888888889e-05, 'epoch': 40.0}
20%|███████████████████████████████▏ | 40/200 [18:09<45:34, 17.09s/it]
Traceback (most recent call last):██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 304/304 [02:41<00:00, 2.19it/s]
File "/home/ubuntu/0522231063/Githubfuxian/TALLRec/TALLRec-main/finetune_rec.py", line 325, in
fire.Fire(train)
File "/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/ubuntu/0522231063/Githubfuxian/TALLRec/TALLRec-main/finetune_rec.py", line 292, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/transformers/trainer.py", line 1662, in train
return inner_training_loop(
File "/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/transformers/trainer.py", line 2006, in _inner_training_loop
self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
File "/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/transformers/trainer.py", line 2287, in _maybe_log_save_evaluate
metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
File "/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/transformers/trainer.py", line 2993, in evaluate
output = eval_loop(
File "/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/transformers/trainer.py", line 3281, in evaluation_loop
metrics = self.compute_metrics(EvalPrediction(predictions=all_preds, label_ids=all_labels))
File "/home/ubuntu/0522231063/Githubfuxian/TALLRec/TALLRec-main/finetune_rec.py", line 222, in compute_metrics
auc = roc_auc_score(pre[1], pre[0])
File "/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/sklearn/utils/_param_validation.py", line 213, in wrapper
return func(*args, **kwargs)
File "/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/sklearn/metrics/_ranking.py", line 606, in roc_auc_score
y_score = check_array(y_score, ensure_2d=False)
File "/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/sklearn/utils/validation.py", line 1003, in check_array
_assert_all_finite(
File "/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/sklearn/utils/validation.py", line 126, in _assert_all_finite
_assert_all_finite_element_wise(
File "/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/sklearn/utils/validation.py", line 175, in _assert_all_finite_element_wise
raise ValueError(msg_err)
ValueError: Input contains NaN.
20%|██████████████████████████████▊ | 40/200 [20:51<1:23:27, 31.30s/it]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant