Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong Initialization of msamp in accelerate #3050

Closed
2 of 4 tasks
Atharva-Phatak opened this issue Aug 26, 2024 · 3 comments · Fixed by #3093
Closed
2 of 4 tasks

Wrong Initialization of msamp in accelerate #3050

Atharva-Phatak opened this issue Aug 26, 2024 · 3 comments · Fixed by #3093

Comments

@Atharva-Phatak
Copy link

System Info

Python : 3.10.12
Accelerate : 0.33.0
System : Pop Os

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
  • My own task or dataset (give details below)

Reproduction

Currently when you do accelerator.prepare(model, optimizer) and use msamp. The below logic is applied

if self.mixed_precision == "fp8" and self.fp8_recipe_handler.backend == "MSAMP":
                args = self._prepare_msamp(*args)
                # MS-AMP will handle the device placement
                device_placement = [False for _ in args]
            result = tuple(
                self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
            )
            result = tuple(self._prepare_one(obj, device_placement=d) for obj, d in zip(result, device_placement))

If you see device placements in always False and we assume msamp will do device placements. But that's not the case.

If you see the logic of initialize in msamp. Then don't move it to device, they just cast the layers and replace some.

If you see the example in their official repo device placement is done first and the msamp.initialize happens.

I tried running an example with accelerate it keeps the model, dataloader on cpu. But without msamp enabled behaves correctly.

Fix is to call _prepare_one method with device placement to True.

Expected behavior

Every argument passed to prepare should be placed on proper device.

PS: I would love to open a PR to fix this issue ❤️

@muellerzr
Copy link
Collaborator

muellerzr commented Aug 26, 2024

Please see this PR :) I’m pretty sure I caught that #3023 (it’s not merged yet bc we’re waiting on a pr to be merged in MSAMP)

@Atharva-Phatak
Copy link
Author

@muellerzr Do you think there is a workaround in the mean time ?

@muellerzr
Copy link
Collaborator

muellerzr commented Aug 26, 2024

You can build/install off that branch. If the upstream PR isn’t merged here soon, we’ll merge it requiring the MSAMP branch when doing FSDP

@github-staff github-staff deleted a comment from Lxx-c Oct 23, 2024
@github-staff github-staff deleted a comment from Lxx-c Oct 23, 2024
@github-staff github-staff deleted a comment from Lxx-c Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants
@muellerzr @Atharva-Phatak and others