Single Node Training #111

xiaokj37 · 2024-06-17T09:34:47Z

Thanks for the open source of Video-ChatGPT, I really like this work very much.
I am now trying to train Video-ChatGPT now.
However, I only have a single node server with 8 4090 GPUs.
I would like to ask how I can modify the initial training code which is adapting to multiple nodes.
torchrun --nproc_per_node=8 --master_port 29001 video_chatgpt/train/train_mem.py \ --model_name_or_path <path to LLaVA-7B-Lightening-v-1-1 model> \ --version v1 \ --data_path <path to the video_chatgpt using convert_instruction_json_to_training_format.pyscript.> \ --video_folder <path to the spatio-temporal features generated in step 4 usingsave_spatio_temporal_clip_features.py script> \ --tune_mm_mlp_adapter True \ --mm_use_vid_start_end \ --bf16 True \ --output_dir ./Video-ChatGPT_7B-1.1_Checkpoints \ --num_train_epochs 3 \ --per_device_train_batch_size 4 \ --per_device_eval_batch_size 4 \ --gradient_accumulation_steps 1 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 3000 \ --save_total_limit 3 \ --learning_rate 2e-5 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 100 \ --tf32 True \ --model_max_length 2048 \ --gradient_checkpointing True \ --lazy_preprocess True
Looking forward to your reply, thank you very much.

The text was updated successfully, but these errors were encountered:

mmaaz60 · 2024-06-22T00:35:16Z

Hi @SeuXiao,

I appreciate your interest in our work. Please note that the Video-ChatGPT code is designed to run on single node with multiple GPUs.

In case if you face any issues, please let me know. Good Luck!

xiaokj37 · 2024-06-22T00:47:27Z

Thanks for your reply.
Currently, I'd like to train video-chatgpt with my custom dataset. And my server is equipped with
8 4090 GPUs. When I use torchrun for training, it appears that CUDA out of memory. Does video-chatgpt need each GPU with 40GB memory?

mmaaz60 · 2024-06-28T16:57:50Z

Hi @SeuXiao

Video-ChatGPT uses a 7B LLM which requires at least 17 GB of Memory to load. Considering other model components and optimizer states, I believe a 32 GB GPU might work.

However, please note that the codes are tested on A100 40GB GPUs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single Node Training #111

Single Node Training #111

xiaokj37 commented Jun 17, 2024

mmaaz60 commented Jun 22, 2024

xiaokj37 commented Jun 22, 2024

mmaaz60 commented Jun 28, 2024

Single Node Training #111

Single Node Training #111

Comments

xiaokj37 commented Jun 17, 2024

mmaaz60 commented Jun 22, 2024

xiaokj37 commented Jun 22, 2024

mmaaz60 commented Jun 28, 2024