Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug formatter for Tensor is confusing with > 1 GPU #2619

Open
zackangelo opened this issue Nov 15, 2024 · 1 comment
Open

Debug formatter for Tensor is confusing with > 1 GPU #2619

zackangelo opened this issue Nov 15, 2024 · 1 comment

Comments

@zackangelo
Copy link
Contributor

When running a model across several processes using NCCL, the debug formatter output will print the same ID for two GPUs:

GPU 0: (dev=Cuda(CudaDevice(DeviceId(1))), shape=[1, 128256], len=128256)
GPU 1: (dev=Cuda(CudaDevice(DeviceId(1))), shape=[1, 128256], len=128256)

It's confusing when looking at logs and trying to figure out which GPU is doing what.

This is because candle uses an atomic counter per-PID to assign a device ID:

impl std::fmt::Debug for CudaDevice {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "CudaDevice({:?})", self.id)
}
}

Would it be a problem to include the CUDA device ordinal in the debug formatter? If not I'll open a PR.

@LaurentMazare
Copy link
Collaborator

Yeah feel free to make a PR that would change it to something like CudaDevice(ordinal:id).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants