-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Epoch counter does not resume when resuming from start checkpoint. #26
Comments
Yes, it's not saved in model data anywhere. Actually config can be saved inside too... I need to think what to save. It's not actually an error, just not enough functionality. |
Gotcha. I've been manually adjusting the "for epoch in range" values in train.py every resume which works i guess. |
I've started working on a more "resume-friendly" fork a while ago with the --resume CLI args, and saving optimizer, scheduler states + epoch, best_sdr and last training loss values within the "last_xxx.ckpt" saved model (+ wandb logging here). Code is not bulletproof. |
Ah, thank you! @jarredou |
You should do a PR for that |
It would require more work for a PR, like I said it's not bulletproof in its current state and can lead to some errors, but since few months, I don't have free time to spend on this, unfortunately. |
Ah, gotcha |
It seems to reset to 0 every time
The text was updated successfully, but these errors were encountered: