Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when train KeyError: None #51

Open
sankexin opened this issue Oct 25, 2024 · 2 comments
Open

when train KeyError: None #51

sankexin opened this issue Oct 25, 2024 · 2 comments

Comments

@sankexin
Copy link

train:
export PYTHONPATH=./
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
OUTPUT_DIR=outputs/vita_video_audio
bash script/train/finetuneTask_nodes.sh ${OUTPUT_DIR}

config:
AudioFolder = "input_wavs"
FolderDict = {
#### NaturalCap
"sharegpt4": "ShareGPT4V",
}

NaturalCap

ShareGPT4V = {"chat_path": "ShareGPT4V/sharegpt4v_instruct_gpt4-vision_cap100k.json"}

[rank4]: Traceback (most recent call last):
[rank4]: File "/home/VITA/vita/train/train.py", line 407, in
[rank4]: train()
[rank4]: File "/home/VITA/vita/train/train.py", line 387, in train
[rank4]: trainer.train()
[rank4]: File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1885, in train
[rank4]: return inner_training_loop(
[rank4]: File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 2178, in _inner_training_loop
[rank4]: for step, inputs in enumerate(epoch_iterator):
[rank4]: File "/usr/local/lib/python3.10/site-packages/accelerate/data_loader.py", line 454, in iter
[rank4]: current_batch = next(dataloader_iter)
[rank4]: File "/usr/local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 631, in next
[rank4]: data = self._next_data()
[rank4]: File "/usr/local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1346, in _next_data
[rank4]: return self._process_data(data)
[rank4]: File "/usr/local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1372, in _process_data
[rank4]: data.reraise()
[rank4]: File "/usr/local/lib/python3.10/site-packages/torch/_utils.py", line 705, in reraise
[rank4]: raise exception
[rank4]: KeyError: Caught KeyError in DataLoader worker process 0.
[rank4]: Original Traceback (most recent call last):
[rank4]: File "/usr/local/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
[rank4]: data = fetcher.fetch(index) # type: ignore[possibly-undefined]
[rank4]: File "/usr/local/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
[rank4]: data = [self.dataset[idx] for idx in possibly_batched_index]
[rank4]: File "/usr/local/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in
[rank4]: data = [self.dataset[idx] for idx in possibly_batched_index]
[rank4]: File "/home/VITA/vita/util/data_utils_video_audio_neg_frameCat.py", line 704, in getitem
[rank4]: image_folder = self.folder_dict[set_id]
[rank4]: KeyError: None

@linhaojia13
Copy link
Collaborator

Your data item in the json file miss the key set. A example json:

[
    ...
    {
        "set": "sharegpt4",
        "id": "000000000164",
        "conversations": [
            {
                "from": "human",
                "value": "<image>\n<audio>\n"
            },
            {
                "from": "gpt",  // follow the setting of llave, "gpt" is only used to indicate that this is the ground truth of the model output
                "value": "This is a well-organized kitchen with a clean, modern aesthetic. The kitchen features a white countertop against a white wall, creating a bright and airy atmosphere. "
            }
        ],
        "image": "coco/images/train2017/000000000164.jpg",
        "audio": [
            "new_value_dict_0717/output_wavs/f61cf238b7872b4903e1fc15dcb5a50c.wav"
        ]
    },
    ...
]

@sankexin
Copy link
Author

sankexin commented Oct 28, 2024

Your data item in the json file miss the key set. A example json:

[
    ...
    {
        "set": "sharegpt4",
        "id": "000000000164",
        "conversations": [
            {
                "from": "human",
                "value": "<image>\n<audio>\n"
            },
            {
                "from": "gpt",  // follow the setting of llave, "gpt" is only used to indicate that this is the ground truth of the model output
                "value": "This is a well-organized kitchen with a clean, modern aesthetic. The kitchen features a white countertop against a white wall, creating a bright and airy atmosphere. "
            }
        ],
        "image": "coco/images/train2017/000000000164.jpg",
        "audio": [
            "new_value_dict_0717/output_wavs/f61cf238b7872b4903e1fc15dcb5a50c.wav"
        ]
    },
    ...
]

thanks for your tips,but I havetried,it is all the same,my json is:

[
{
"set": "sharegpt4",
"id": "000000000164",
"conversations": [
{
"from": "human",
"value": "\ninput_wavs/promp0.wav\n"
},
{
"from": "gpt",
"value": "This is a well-organized kitchen with a clean, modern aesthetic. The kitchen features a white countertop against a white wall, creating a bright and airy atmosphere. "
}
],
"image": "coco/images/train2017/000000000164.jpg",
"audio": [
"audio0.wav"
]
}
]

so, Do you have any example code that has been run in real-life scenarios ? or anything bug in device ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants