Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train without projection layer #17

Open
immaturehearts opened this issue May 27, 2024 · 9 comments
Open

train without projection layer #17

immaturehearts opened this issue May 27, 2024 · 9 comments

Comments

@immaturehearts
Copy link

Is there any way to train without the proposed projection layer? like the ablation study did?

@lukasHoel
Copy link
Contributor

Yes, you can change the configuration in train.sh as follows:

--finetune-config.cross_frame_attention.unproj_reproj_mode "none"

@immaturehearts
Copy link
Author

I did so, but got some error.
File "/data/ViewDiff/viewdiff/model/custom_attention_processor.py", line 29, in expand_batch return x.reshape(n // frames_per_batch, frames_per_batch, *other_dims) RuntimeError: shape '[0, 3, 16, 1280]' is invalid for input of size 20480

@lukasHoel
Copy link
Contributor

Ah sorry, I also forgot about this detail. You also need to add these arguments:

--finetune-config.cross_frame_attention.n_cfa_down_blocks "0"
--finetune-config.cross_frame_attention.n_cfa_up_blocks "0"
--finetune-config.cross_frame_attention.no_cfa_in_mid_block

It will stil perform cross-frame-attention, because we set

--finetune-config.cross_frame_attention.mode "pretrained"

@immaturehearts
Copy link
Author

Thank you so much!! It works! And btw, I wonder why the gpu memory usage is the same with or without the projection layer? It's not quite what I expected.

@lukasHoel
Copy link
Contributor

The memory usage should not be the same. For me, train.sh utilizes around 42GB of memory with the projection layer, but 32GB of memory without the projection layer on one A100 GPU. I used the original train.sh script and modified it as discussed in this thread to get these numbers.

@immaturehearts
Copy link
Author

Oh, I see. I used the train_small.sh and it's nearly the same. Thanks.

@lukasHoel
Copy link
Contributor

That's expected, because train_small.sh is not using the full architecture. In particular, note the differences for these attributes:

--finetune-config.cross_frame_attention.num_3d_layers "5" (default) vs. "1" (small)
--finetune-config.cross_frame_attention.dim_3d_grid "128" (default) vs. "64" (small)
--finetune-config.cross_frame_attention.n_cfa_down_blocks "1" (default) vs. "0" (small)
--finetune-config.cross_frame_attention.n_cfa_up_blocks "1" (default) vs. "0" (small)

It means, that train_small.sh is already using only a very small 3D-grid in the bottleneck of the U-Net and thus does not use that much memory for this layer anyways. So completely removing it would not further reduce the memory.

@immaturehearts
Copy link
Author

Okay, and also I wonder how many epochs did you train to get a good result.

@lukasHoel
Copy link
Contributor

We trained for 60K iterations with 2xA100 GPUs using the train.sh script

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants