train without projection layer #17

immaturehearts · 2024-05-27T14:00:54Z

Is there any way to train without the proposed projection layer? like the ablation study did?

lukasHoel · 2024-05-27T14:11:12Z

Yes, you can change the configuration in train.sh as follows:

--finetune-config.cross_frame_attention.unproj_reproj_mode "none"

immaturehearts · 2024-05-27T14:15:23Z

I did so, but got some error.
File "/data/ViewDiff/viewdiff/model/custom_attention_processor.py", line 29, in expand_batch return x.reshape(n // frames_per_batch, frames_per_batch, *other_dims) RuntimeError: shape '[0, 3, 16, 1280]' is invalid for input of size 20480

lukasHoel · 2024-05-27T15:00:16Z

Ah sorry, I also forgot about this detail. You also need to add these arguments:

--finetune-config.cross_frame_attention.n_cfa_down_blocks "0"
--finetune-config.cross_frame_attention.n_cfa_up_blocks "0"
--finetune-config.cross_frame_attention.no_cfa_in_mid_block

It will stil perform cross-frame-attention, because we set

--finetune-config.cross_frame_attention.mode "pretrained"

immaturehearts · 2024-05-28T02:28:56Z

Thank you so much!! It works! And btw, I wonder why the gpu memory usage is the same with or without the projection layer? It's not quite what I expected.

lukasHoel · 2024-05-28T10:32:12Z

The memory usage should not be the same. For me, train.sh utilizes around 42GB of memory with the projection layer, but 32GB of memory without the projection layer on one A100 GPU. I used the original train.sh script and modified it as discussed in this thread to get these numbers.

immaturehearts · 2024-05-28T12:33:13Z

Oh, I see. I used the train_small.sh and it's nearly the same. Thanks.

lukasHoel · 2024-05-28T12:46:30Z

That's expected, because train_small.sh is not using the full architecture. In particular, note the differences for these attributes:

--finetune-config.cross_frame_attention.num_3d_layers "5" (default) vs. "1" (small)
--finetune-config.cross_frame_attention.dim_3d_grid "128" (default) vs. "64" (small)
--finetune-config.cross_frame_attention.n_cfa_down_blocks "1" (default) vs. "0" (small)
--finetune-config.cross_frame_attention.n_cfa_up_blocks "1" (default) vs. "0" (small)

It means, that train_small.sh is already using only a very small 3D-grid in the bottleneck of the U-Net and thus does not use that much memory for this layer anyways. So completely removing it would not further reduce the memory.

immaturehearts · 2024-05-28T12:50:27Z

Okay, and also I wonder how many epochs did you train to get a good result.

lukasHoel · 2024-05-28T15:47:23Z

We trained for 60K iterations with 2xA100 GPUs using the train.sh script

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

train without projection layer #17

train without projection layer #17

immaturehearts commented May 27, 2024

lukasHoel commented May 27, 2024

immaturehearts commented May 27, 2024

lukasHoel commented May 27, 2024

immaturehearts commented May 28, 2024

lukasHoel commented May 28, 2024

immaturehearts commented May 28, 2024

lukasHoel commented May 28, 2024

immaturehearts commented May 28, 2024

lukasHoel commented May 28, 2024

train without projection layer #17

train without projection layer #17

Comments

immaturehearts commented May 27, 2024

lukasHoel commented May 27, 2024

immaturehearts commented May 27, 2024

lukasHoel commented May 27, 2024

immaturehearts commented May 28, 2024

lukasHoel commented May 28, 2024

immaturehearts commented May 28, 2024

lukasHoel commented May 28, 2024

immaturehearts commented May 28, 2024

lukasHoel commented May 28, 2024