Advice on sharding for data parallelism #566

DBraun · 2024-08-24T23:58:02Z

I'm using code similar to the 8-way batch data parallelism example here: https://jax.readthedocs.io/en/latest/notebooks/Distributed_arrays_and_automatic_parallelization.html#way-batch-data-parallelism

mesh = Mesh(mesh_utils.create_device_mesh((8,)), 'batch')
sharding = NamedSharding(mesh, P('batch'))
replicated_sharding = NamedSharding(mesh, P())
batch = jax.device_put(batch, sharding)

So I think if batch is shaped (B, H, W, C) like an image, then if B is a multiple of 8, then the data is nicely distributed among the 8 devices. Is Grain able to prepare a batch like this? I haven't been able to figure it out from looking at the ShardOptions.

The text was updated successfully, but these errors were encountered:

DBraun · 2024-11-13T16:47:01Z

Is there a solution with ShardByJaxProcess?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Advice on sharding for data parallelism #566

Advice on sharding for data parallelism #566

DBraun commented Aug 24, 2024

DBraun commented Nov 13, 2024

Advice on sharding for data parallelism #566

Advice on sharding for data parallelism #566

Comments

DBraun commented Aug 24, 2024

DBraun commented Nov 13, 2024