You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After CUDA runs out of memory, the loss calculation and summary will fail due to a division by zero error. It seems that the batch skipping functionality is not entirely working...?
(Screenshot is taken on batch size 16.)
Conda and pip envs are as follows. pip_env.txt conda_env.txt
While using a smaller batch size is a valid workaround, any help will be appreciated here.
Edit: It seems that at batch size 4 the skip works properly, but messes with the validation possibly due to skipping 1-sized batches.
The text was updated successfully, but these errors were encountered:
Linus-XZX
changed the title
Float division by zero on memory outage
Float division by zero on CUDA memory outage
Jun 17, 2024
After CUDA runs out of memory, the loss calculation and summary will fail due to a division by zero error. It seems that the batch skipping functionality is not entirely working...?
(Screenshot is taken on batch size 16.)
Conda and pip envs are as follows.
pip_env.txt
conda_env.txt
While using a smaller batch size is a valid workaround, any help will be appreciated here.
Edit: It seems that at batch size 4 the skip works properly, but messes with the validation possibly due to skipping 1-sized batches.
The text was updated successfully, but these errors were encountered: