Pytorch training error when using the MSG3D config #242

cannon281 · 2024-05-27T05:06:35Z

Hi,

I have been using pyskl framework with the specified conda environment to train posec3d and stgcnn++. Training and inference works fine.
However when I tried the MSG3D config (configs/msg3d/msg3d_pyskl_ntu60_xsub_hrnet) as soon as training starts, pytorch throws an error regarding inplace operation in the model structure.
I have experimented by setting Relu activations in msg3d with inplace=False without much success, any help is much appreciated.

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [32, 192, 25, 85]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck! ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 5051) of binary: /opt/conda/envs/pyskl/bin/python

The text was updated successfully, but these errors were encountered:

HenryMantilla · 2024-05-28T17:49:46Z

Hi, I was facing the same problem today, what worked for me was changing these lines in the msg3d_utils.py file. Line 139, Line 232 and finally Line 316. Basically replace all "something1 += something2" by "something1 = something1 + something2"

cannon281 · 2024-05-31T02:46:45Z

Hi @HenryMantilla
Thanks for the help, it was similar to what you mentioned, for me changing the code below made it work

file /pyskl/models/gcns/utils/msg3d_utils.py b/pyskl/models/gcns/utils/msg3d_utils.py

from line 232, changing

out += res
return self.act(out)

to this

out_res = out + res
return self.act(out_res)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pytorch training error when using the MSG3D config #242

Pytorch training error when using the MSG3D config #242

cannon281 commented May 27, 2024

HenryMantilla commented May 28, 2024 •

edited

Loading

cannon281 commented May 31, 2024

Pytorch training error when using the MSG3D config #242

Pytorch training error when using the MSG3D config #242

Comments

cannon281 commented May 27, 2024

HenryMantilla commented May 28, 2024 • edited Loading

cannon281 commented May 31, 2024

HenryMantilla commented May 28, 2024 •

edited

Loading