Torch optimisers expect the step function to run the closure immediately if passed #91

tomogwen · 2024-02-21T12:34:27Z

As per title. If the closure is not None, we should assume that the parameter's gradients have not been computed, and immediately run the closure. See, e.g., the step function in torch.optim.adamw as an example of this.

Without this, the library won't work out the box with PyTorch Lightning, and I imagine this is also the source of the problem for #90 with the HF transformers/accelerate libraries.

stale · 2024-04-22T06:13:19Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

run closure at start of step function to produce gradients

7022575

stale bot added the stale label Apr 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Torch optimisers expect the step function to run the closure immediately if passed #91

Torch optimisers expect the step function to run the closure immediately if passed #91

tomogwen commented Feb 21, 2024 •

edited

Loading

stale bot commented Apr 22, 2024

Torch optimisers expect the step function to run the closure immediately if passed #91

Are you sure you want to change the base?

Torch optimisers expect the step function to run the closure immediately if passed #91

Conversation

tomogwen commented Feb 21, 2024 • edited Loading

stale bot commented Apr 22, 2024

tomogwen commented Feb 21, 2024 •

edited

Loading