Wrong Initialization of msamp in accelerate #3050

Atharva-Phatak · 2024-08-26T21:17:45Z

System Info

Python : 3.10.12
Accelerate : 0.33.0
System : Pop Os

Information

The official example scripts
My own modified scripts

Tasks

One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
My own task or dataset (give details below)

Reproduction

Currently when you do accelerator.prepare(model, optimizer) and use msamp. The below logic is applied

if self.mixed_precision == "fp8" and self.fp8_recipe_handler.backend == "MSAMP":
                args = self._prepare_msamp(*args)
                # MS-AMP will handle the device placement
                device_placement = [False for _ in args]
            result = tuple(
                self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
            )
            result = tuple(self._prepare_one(obj, device_placement=d) for obj, d in zip(result, device_placement))

If you see device placements in always False and we assume msamp will do device placements. But that's not the case.

If you see the logic of initialize in msamp. Then don't move it to device, they just cast the layers and replace some.

If you see the example in their official repo device placement is done first and the msamp.initialize happens.

I tried running an example with accelerate it keeps the model, dataloader on cpu. But without msamp enabled behaves correctly.

Fix is to call _prepare_one method with device placement to True.

Expected behavior

Every argument passed to prepare should be placed on proper device.

PS: I would love to open a PR to fix this issue ❤️

The text was updated successfully, but these errors were encountered:

muellerzr · 2024-08-26T21:24:40Z

Please see this PR :) I’m pretty sure I caught that #3023 (it’s not merged yet bc we’re waiting on a pr to be merged in MSAMP)

Atharva-Phatak · 2024-08-26T21:27:24Z

@muellerzr Do you think there is a workaround in the mean time ?

muellerzr · 2024-08-26T21:28:21Z

You can build/install off that branch. If the upstream PR isn’t merged here soon, we’ll merge it requiring the MSAMP branch when doing FSDP

muellerzr mentioned this issue Sep 9, 2024

MS-AMP support (w/o FSDP) #3093

Merged

5 tasks

muellerzr closed this as completed in #3093 Sep 10, 2024

github-staff deleted a comment from Lxx-c Oct 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong Initialization of msamp in accelerate #3050

Wrong Initialization of msamp in accelerate #3050

Atharva-Phatak commented Aug 26, 2024

muellerzr commented Aug 26, 2024 •

edited

Loading

Atharva-Phatak commented Aug 26, 2024

muellerzr commented Aug 26, 2024 •

edited

Loading

Wrong Initialization of msamp in accelerate #3050

Wrong Initialization of msamp in accelerate #3050

Comments

Atharva-Phatak commented Aug 26, 2024

System Info

Information

Tasks

Reproduction

Expected behavior

muellerzr commented Aug 26, 2024 • edited Loading

Atharva-Phatak commented Aug 26, 2024

muellerzr commented Aug 26, 2024 • edited Loading

muellerzr commented Aug 26, 2024 •

edited

Loading

muellerzr commented Aug 26, 2024 •

edited

Loading