when train KeyError: None #51

sankexin · 2024-10-25T10:01:12Z

train:
export PYTHONPATH=./
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
OUTPUT_DIR=outputs/vita_video_audio
bash script/train/finetuneTask_nodes.sh ${OUTPUT_DIR}

config:
AudioFolder = "input_wavs"
FolderDict = {
#### NaturalCap
"sharegpt4": "ShareGPT4V",
}

NaturalCap

ShareGPT4V = {"chat_path": "ShareGPT4V/sharegpt4v_instruct_gpt4-vision_cap100k.json"}

[rank4]: Traceback (most recent call last):
[rank4]: File "/home/VITA/vita/train/train.py", line 407, in
[rank4]: train()
[rank4]: File "/home/VITA/vita/train/train.py", line 387, in train
[rank4]: trainer.train()
[rank4]: File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1885, in train
[rank4]: return inner_training_loop(
[rank4]: File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 2178, in _inner_training_loop
[rank4]: for step, inputs in enumerate(epoch_iterator):
[rank4]: File "/usr/local/lib/python3.10/site-packages/accelerate/data_loader.py", line 454, in iter
[rank4]: current_batch = next(dataloader_iter)
[rank4]: File "/usr/local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 631, in next
[rank4]: data = self._next_data()
[rank4]: File "/usr/local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1346, in _next_data
[rank4]: return self._process_data(data)
[rank4]: File "/usr/local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1372, in _process_data
[rank4]: data.reraise()
[rank4]: File "/usr/local/lib/python3.10/site-packages/torch/_utils.py", line 705, in reraise
[rank4]: raise exception
[rank4]: KeyError: Caught KeyError in DataLoader worker process 0.
[rank4]: Original Traceback (most recent call last):
[rank4]: File "/usr/local/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
[rank4]: data = fetcher.fetch(index) # type: ignore[possibly-undefined]
[rank4]: File "/usr/local/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
[rank4]: data = [self.dataset[idx] for idx in possibly_batched_index]
[rank4]: File "/usr/local/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in
[rank4]: data = [self.dataset[idx] for idx in possibly_batched_index]
[rank4]: File "/home/VITA/vita/util/data_utils_video_audio_neg_frameCat.py", line 704, in getitem
[rank4]: image_folder = self.folder_dict[set_id]
[rank4]: KeyError: None

linhaojia13 · 2024-10-27T11:44:52Z

Your data item in the json file miss the key set. A example json:

[
    ...
    {
        "set": "sharegpt4",
        "id": "000000000164",
        "conversations": [
            {
                "from": "human",
                "value": "<image>\n<audio>\n"
            },
            {
                "from": "gpt",  // follow the setting of llave, "gpt" is only used to indicate that this is the ground truth of the model output
                "value": "This is a well-organized kitchen with a clean, modern aesthetic. The kitchen features a white countertop against a white wall, creating a bright and airy atmosphere. "
            }
        ],
        "image": "coco/images/train2017/000000000164.jpg",
        "audio": [
            "new_value_dict_0717/output_wavs/f61cf238b7872b4903e1fc15dcb5a50c.wav"
        ]
    },
    ...
]

sankexin · 2024-10-28T02:25:37Z

Your data item in the json file miss the key set. A example json:

[
    ...
    {
        "set": "sharegpt4",
        "id": "000000000164",
        "conversations": [
            {
                "from": "human",
                "value": "<image>\n<audio>\n"
            },
            {
                "from": "gpt",  // follow the setting of llave, "gpt" is only used to indicate that this is the ground truth of the model output
                "value": "This is a well-organized kitchen with a clean, modern aesthetic. The kitchen features a white countertop against a white wall, creating a bright and airy atmosphere. "
            }
        ],
        "image": "coco/images/train2017/000000000164.jpg",
        "audio": [
            "new_value_dict_0717/output_wavs/f61cf238b7872b4903e1fc15dcb5a50c.wav"
        ]
    },
    ...
]

thanks for your tips，but I havetried，it is all the same，my json is：

[
{
"set": "sharegpt4",
"id": "000000000164",
"conversations": [
{
"from": "human",
"value": "\ninput_wavs/promp0.wav\n"
},
{
"from": "gpt",
"value": "This is a well-organized kitchen with a clean, modern aesthetic. The kitchen features a white countertop against a white wall, creating a bright and airy atmosphere. "
}
],
"image": "coco/images/train2017/000000000164.jpg",
"audio": [
"audio0.wav"
]
}
]

so， Do you have any example code that has been run in real-life scenarios ？ or anything bug in device ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

when train KeyError: None #51

when train KeyError: None #51

sankexin commented Oct 25, 2024

linhaojia13 commented Oct 27, 2024

sankexin commented Oct 28, 2024 •

edited

Loading

when train KeyError: None #51

when train KeyError: None #51

Comments

sankexin commented Oct 25, 2024

NaturalCap

linhaojia13 commented Oct 27, 2024

sankexin commented Oct 28, 2024 • edited Loading

sankexin commented Oct 28, 2024 •

edited

Loading