Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pytorch training error when using the MSG3D config #242

Open
cannon281 opened this issue May 27, 2024 · 2 comments
Open

Pytorch training error when using the MSG3D config #242

cannon281 opened this issue May 27, 2024 · 2 comments

Comments

@cannon281
Copy link

Hi,

I have been using pyskl framework with the specified conda environment to train posec3d and stgcnn++. Training and inference works fine.
However when I tried the MSG3D config (configs/msg3d/msg3d_pyskl_ntu60_xsub_hrnet) as soon as training starts, pytorch throws an error regarding inplace operation in the model structure.
I have experimented by setting Relu activations in msg3d with inplace=False without much success, any help is much appreciated.

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [32, 192, 25, 85]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck! ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 5051) of binary: /opt/conda/envs/pyskl/bin/python

@HenryMantilla
Copy link

HenryMantilla commented May 28, 2024

Hi, I was facing the same problem today, what worked for me was changing these lines in the msg3d_utils.py file. Line 139, Line 232 and finally Line 316. Basically replace all "something1 += something2" by "something1 = something1 + something2"

@cannon281
Copy link
Author

Hi @HenryMantilla
Thanks for the help, it was similar to what you mentioned, for me changing the code below made it work

file /pyskl/models/gcns/utils/msg3d_utils.py b/pyskl/models/gcns/utils/msg3d_utils.py

from line 232, changing

out += res
return self.act(out)

to this

out_res = out + res
return self.act(out_res)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants