You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:04<00:00, 2.23s/it]
Checkpoint /home/ubuntu/0522231063/Githubfuxian/TALLRec/TALLRec-main/alpaca-lora-7B/adapter_config.json/adapter_model.bin not found
trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199
Map: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [00:00<00:00, 1383.30 examples/s]
Map: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2427/2427 [00:01<00:00, 1593.21 examples/s]
Using the WANDB_DISABLED environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
0%| | 0/200 [00:00<?, ?it/s]/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/torch/utils/checkpoint.py:464: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
warnings.warn(
{'loss': 1.0435, 'learning_rate': 4e-05, 'epoch': 8.0}
{'eval_loss': 2.0803046226501465, 'eval_auc': 0.499365194424907, 'eval_runtime': 162.0148, 'eval_samples_per_second': 14.98, 'eval_steps_per_second': 1.876, 'epoch': 10.0}
5%|███████▊ | 10/200 [05:12<47:31, 15.01s/it]
/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/torch/utils/checkpoint.py:464: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
warnings.warn(
{'loss': 1.0116, 'learning_rate': 8e-05, 'epoch': 16.0}
{'eval_loss': 1.836759328842163, 'eval_auc': 0.5092683767063315, 'eval_runtime': 162.8768, 'eval_samples_per_second': 14.901, 'eval_steps_per_second': 1.866, 'epoch': 20.0}
10%|███████████████▌ | 20/200 [10:25<51:03, 17.02s/it]
/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/torch/utils/checkpoint.py:464: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
warnings.warn(
{'loss': 0.9067, 'learning_rate': 9.777777777777778e-05, 'epoch': 24.0}
{'eval_loss': 1.3795101642608643, 'eval_auc': 0.5749956938005082, 'eval_runtime': 162.5594, 'eval_samples_per_second': 14.93, 'eval_steps_per_second': 1.87, 'epoch': 30.0}
15%|███████████████████████▍ | 30/200 [15:38<48:25, 17.09s/it]
/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/torch/utils/checkpoint.py:464: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
warnings.warn(
{'loss': 0.7264, 'learning_rate': 9.333333333333334e-05, 'epoch': 32.0}
{'loss': 0.5474, 'learning_rate': 8.888888888888889e-05, 'epoch': 40.0}
20%|███████████████████████████████▏ | 40/200 [18:09<45:34, 17.09s/it]
Traceback (most recent call last):██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 304/304 [02:41<00:00, 2.19it/s]
File "/home/ubuntu/0522231063/Githubfuxian/TALLRec/TALLRec-main/finetune_rec.py", line 325, in
fire.Fire(train)
File "/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/ubuntu/0522231063/Githubfuxian/TALLRec/TALLRec-main/finetune_rec.py", line 292, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/transformers/trainer.py", line 1662, in train
return inner_training_loop(
File "/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/transformers/trainer.py", line 2006, in _inner_training_loop
self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
File "/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/transformers/trainer.py", line 2287, in _maybe_log_save_evaluate
metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
File "/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/transformers/trainer.py", line 2993, in evaluate
output = eval_loop(
File "/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/transformers/trainer.py", line 3281, in evaluation_loop
metrics = self.compute_metrics(EvalPrediction(predictions=all_preds, label_ids=all_labels))
File "/home/ubuntu/0522231063/Githubfuxian/TALLRec/TALLRec-main/finetune_rec.py", line 222, in compute_metrics
auc = roc_auc_score(pre[1], pre[0])
File "/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/sklearn/utils/_param_validation.py", line 213, in wrapper
return func(*args, **kwargs)
File "/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/sklearn/metrics/_ranking.py", line 606, in roc_auc_score
y_score = check_array(y_score, ensure_2d=False)
File "/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/sklearn/utils/validation.py", line 1003, in check_array
_assert_all_finite(
File "/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/sklearn/utils/validation.py", line 126, in _assert_all_finite
_assert_all_finite_element_wise(
File "/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/sklearn/utils/validation.py", line 175, in _assert_all_finite_element_wise
raise ValueError(msg_err)
ValueError: Input contains NaN.
20%|██████████████████████████████▊ | 40/200 [20:51<1:23:27, 31.30s/it]
The text was updated successfully, but these errors were encountered:
尊敬的作者您好,我在复现您的代码的过程中出现了如下问题,我是将llama2-7b-hf下载到本地通过本地调用实现代码的运行。在运行过程中出现了中断,显示输入有NaN。请问您有好的解决方案吗?
(Tallrec) ubuntu@ubuntu:~/0522231063/Githubfuxian/TALLRec/TALLRec-main$ bash ./shell/instruct_7B.sh 1 42
1, 42
lr: 1e-4, dropout: 0.05 , seed: 42, sample: 64
Training Alpaca-LoRA model with params:
base_model: /home/ubuntu/llama2-7b-hf
train_data_path: /home/ubuntu/0522231063/Githubfuxian/TALLRec/TALLRec-main/data/book/train.json
val_data_path: /home/ubuntu/0522231063/Githubfuxian/TALLRec/TALLRec-main/data/book/valid.json
sample: 64
seed: 42
output_dir: /home/ubuntu/0522231063/Githubfuxian/TALLRec/TALLRec-main/model_42_64
batch_size: 128
micro_batch_size: 32
num_epochs: 200
learning_rate: 0.0001
cutoff_len: 512
lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules: ['q_proj', 'v_proj']
train_on_inputs: True
group_by_length: True
wandb_project:
wandb_run_name:
wandb_watch:
wandb_log_model:
resume_from_checkpoint: /home/ubuntu/0522231063/Githubfuxian/TALLRec/TALLRec-main/alpaca-lora-7B/adapter_config.json
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:04<00:00, 2.23s/it]
Checkpoint /home/ubuntu/0522231063/Githubfuxian/TALLRec/TALLRec-main/alpaca-lora-7B/adapter_config.json/adapter_model.bin not found
trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199
Map: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [00:00<00:00, 1383.30 examples/s]
Map: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2427/2427 [00:01<00:00, 1593.21 examples/s]
Using the
WANDB_DISABLED
environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).0%| | 0/200 [00:00<?, ?it/s]/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/torch/utils/checkpoint.py:464: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
warnings.warn(
{'loss': 1.0435, 'learning_rate': 4e-05, 'epoch': 8.0}
{'eval_loss': 2.0803046226501465, 'eval_auc': 0.499365194424907, 'eval_runtime': 162.0148, 'eval_samples_per_second': 14.98, 'eval_steps_per_second': 1.876, 'epoch': 10.0}
5%|███████▊ | 10/200 [05:12<47:31, 15.01s/it]
/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/torch/utils/checkpoint.py:464: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
warnings.warn(
{'loss': 1.0116, 'learning_rate': 8e-05, 'epoch': 16.0}
{'eval_loss': 1.836759328842163, 'eval_auc': 0.5092683767063315, 'eval_runtime': 162.8768, 'eval_samples_per_second': 14.901, 'eval_steps_per_second': 1.866, 'epoch': 20.0}
10%|███████████████▌ | 20/200 [10:25<51:03, 17.02s/it]
/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/torch/utils/checkpoint.py:464: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
warnings.warn(
{'loss': 0.9067, 'learning_rate': 9.777777777777778e-05, 'epoch': 24.0}
{'eval_loss': 1.3795101642608643, 'eval_auc': 0.5749956938005082, 'eval_runtime': 162.5594, 'eval_samples_per_second': 14.93, 'eval_steps_per_second': 1.87, 'epoch': 30.0}
15%|███████████████████████▍ | 30/200 [15:38<48:25, 17.09s/it]
/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/torch/utils/checkpoint.py:464: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
warnings.warn(
{'loss': 0.7264, 'learning_rate': 9.333333333333334e-05, 'epoch': 32.0}
{'loss': 0.5474, 'learning_rate': 8.888888888888889e-05, 'epoch': 40.0}
20%|███████████████████████████████▏ | 40/200 [18:09<45:34, 17.09s/it]
Traceback (most recent call last):██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 304/304 [02:41<00:00, 2.19it/s]
File "/home/ubuntu/0522231063/Githubfuxian/TALLRec/TALLRec-main/finetune_rec.py", line 325, in
fire.Fire(train)
File "/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/ubuntu/0522231063/Githubfuxian/TALLRec/TALLRec-main/finetune_rec.py", line 292, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/transformers/trainer.py", line 1662, in train
return inner_training_loop(
File "/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/transformers/trainer.py", line 2006, in _inner_training_loop
self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
File "/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/transformers/trainer.py", line 2287, in _maybe_log_save_evaluate
metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
File "/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/transformers/trainer.py", line 2993, in evaluate
output = eval_loop(
File "/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/transformers/trainer.py", line 3281, in evaluation_loop
metrics = self.compute_metrics(EvalPrediction(predictions=all_preds, label_ids=all_labels))
File "/home/ubuntu/0522231063/Githubfuxian/TALLRec/TALLRec-main/finetune_rec.py", line 222, in compute_metrics
auc = roc_auc_score(pre[1], pre[0])
File "/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/sklearn/utils/_param_validation.py", line 213, in wrapper
return func(*args, **kwargs)
File "/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/sklearn/metrics/_ranking.py", line 606, in roc_auc_score
y_score = check_array(y_score, ensure_2d=False)
File "/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/sklearn/utils/validation.py", line 1003, in check_array
_assert_all_finite(
File "/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/sklearn/utils/validation.py", line 126, in _assert_all_finite
_assert_all_finite_element_wise(
File "/home/ubuntu/anaconda3/envs/Tallrec/lib/python3.10/site-packages/sklearn/utils/validation.py", line 175, in _assert_all_finite_element_wise
raise ValueError(msg_err)
ValueError: Input contains NaN.
20%|██████████████████████████████▊ | 40/200 [20:51<1:23:27, 31.30s/it]
The text was updated successfully, but these errors were encountered: