Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPUoffloadOptimizer issues #1209

Open
felipemello1 opened this issue Nov 1, 2024 · 4 comments
Open

CPUoffloadOptimizer issues #1209

felipemello1 opened this issue Nov 1, 2024 · 4 comments
Labels
bug Something isn't working optimizer

Comments

@felipemello1
Copy link

hi all, i was giving the CPUOffloadOptimizer a try and found two issues when using with QLoRA single device in torchtune:

  1. When using a LR scheduler i got. Maybe there is a way to inherit the optimizer class?
File "/data/users/felipemello/torchtune/torchtune/training/lr_schedulers.py", line 58, in get_cosine_schedule_with_warmup
    return LambdaLR(optimizer, lr_lambda, last_epoch)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/felipemello/.conda/envs/torchtune/lib/python3.11/site-packages/torch/optim/lr_scheduler.py", line 336, in __init__
    super().__init__(optimizer, last_epoch, verbose)
  File "/home/felipemello/.conda/envs/torchtune/lib/python3.11/site-packages/torch/optim/lr_scheduler.py", line 99, in __init__
    raise TypeError(f"{type(optimizer).__name__} is not an Optimizer")
TypeError: CPUOffloadOptimizer is not an Optimizer
  1. When passing model.params() i got the error below. I imagine that a simple fix is to keep only params that require grad, like adamw implementation oes
  File "/home/felipemello/.conda/envs/torchtune/lib/python3.11/site-packages/torchao/prototype/low_bit_optim/cpu_offload.py", line 76, in __init__
    p_cuda.register_post_accumulate_grad_hook(backward_hook)
  File "/home/felipemello/.conda/envs/torchtune/lib/python3.11/site-packages/torch/_tensor.py", line 678, in register_post_accumulate_grad_hook
    raise RuntimeError(
RuntimeError: cannot register a hook on a tensor that doesn't require gradient

cc: @gau-nernst

@gau-nernst
Copy link
Collaborator

1 is a known issue. You can see my view here #959 (comment). I will look into torch.optim.Optimizer base class to see what could go wrong if I make CPUOffloadOptimizer inherit it. For example, on the top of my head, CPUOffloadOptimizer will not have self.state.

In the meantime, CPUOffloadOptimizer requires setting LR manually #584 (comment)

For 2, it's an oversight from my part. We can simply add a requires grad check here. Will push a fix

for p_cuda in params:
# pre-allocate CPU params and grads
p_cpu = torch.empty_like(p_cuda, device="cpu", pin_memory=True)
p_cpu.grad = torch.empty_like(p_cpu, pin_memory=True)
p_cpu.copy_(p_cuda.detach(), non_blocking=True)
self.param_cuda2cpu_map[p_cuda] = p_cpu
p_cuda.register_post_accumulate_grad_hook(backward_hook)
self.optim_dict[p_cuda] = optimizer_class([{"params": p_cpu, **param_group}], **kwargs)

@fzyzcjy
Copy link

fzyzcjy commented Nov 18, 2024

Hi, is there any updates? Thanks! It would be great if it can be directly plugged into huggingface transformers, but now it has errors caused by scheduler issue above:

[10:19:58.912]:     self.trainer.inner.train()
[10:19:58.912]:   File "/opt/conda/lib/python3.11/site-packages/trl/trainer/sft_trainer.py", line 434, in train
[10:19:58.912]:     output = super().train(*args, **kwargs)
[10:19:58.912]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[10:19:58.912]:   File "/opt/conda/lib/python3.11/site-packages/transformers/trainer.py", line 2123, in train
[10:19:58.912]:     return inner_training_loop(
[10:19:58.912]:            ^^^^^^^^^^^^^^^^^^^^
[10:19:58.912]:   File "/opt/conda/lib/python3.11/site-packages/transformers/trainer.py", line 2224, in _inner_training_loop
[10:19:58.912]:     self.create_optimizer_and_scheduler(num_training_steps=max_steps)
[10:19:58.912]:   File "/opt/conda/lib/python3.11/site-packages/transformers/trainer.py", line 1130, in create_optimizer_and_scheduler
[10:19:58.912]:     self.create_scheduler(num_training_steps=num_training_steps, optimizer=optimizer)
[10:19:58.912]:   File "/opt/conda/lib/python3.11/site-packages/transformers/trainer.py", line 1632, in create_scheduler
[10:19:58.912]:     self.lr_scheduler = get_scheduler(
[10:19:58.912]:                         ^^^^^^^^^^^^^^
[10:19:58.912]:   File "/opt/conda/lib/python3.11/site-packages/transformers/optimization.py", line 550, in get_scheduler
[10:19:58.913]:     return schedule_func(
[10:19:58.913]:            ^^^^^^^^^^^^^^
[10:19:58.913]:   File "/opt/conda/lib/python3.11/site-packages/transformers/optimization.py", line 132, in get_linear_schedule_with_warmup
[10:19:58.913]:     return LambdaLR(optimizer, lr_lambda, last_epoch)
[10:19:58.913]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[10:19:58.913]:   File "/opt/conda/lib/python3.11/site-packages/torch/optim/lr_scheduler.py", line 336, in __init__
[10:19:58.913]:     super().__init__(optimizer, last_epoch, verbose)
[10:19:58.913]:   File "/opt/conda/lib/python3.11/site-packages/torch/optim/lr_scheduler.py", line 99, in __init__
[10:19:58.913]:     raise TypeError(f"{type(optimizer).__name__} is not an Optimizer")
[10:19:58.913]: TypeError: CPUOffloadOptimizer is not an Optimizer

@gau-nernst
Copy link
Collaborator

@fzyzcjy To unblock your case, you can try making CPUOffloadOptimizer a subclass of torch.optim.Optimizer i.e. change the following line

class CPUOffloadOptimizer:

to class CPUOffloadOptimizer(Optimizer):. Make sure to not call super().__init__(), as this is just a workaround to pass the class check by PyTorch LR scheduler. I will investigate if this will cause other issues before merging the fix.

IMO, since Python is duck-typing, PyTorch LR scheduler should not explicitly check for the optimizer class.

@fzyzcjy
Copy link

fzyzcjy commented Nov 19, 2024

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working optimizer
Projects
None yet
Development

No branches or pull requests

4 participants