We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tensor
When running a model across several processes using NCCL, the debug formatter output will print the same ID for two GPUs:
GPU 0: (dev=Cuda(CudaDevice(DeviceId(1))), shape=[1, 128256], len=128256) GPU 1: (dev=Cuda(CudaDevice(DeviceId(1))), shape=[1, 128256], len=128256)
It's confusing when looking at logs and trying to figure out which GPU is doing what.
This is because candle uses an atomic counter per-PID to assign a device ID:
candle/candle-core/src/cuda_backend/device.rs
Lines 35 to 39 in 00d8a0c
Would it be a problem to include the CUDA device ordinal in the debug formatter? If not I'll open a PR.
The text was updated successfully, but these errors were encountered:
Yeah feel free to make a PR that would change it to something like CudaDevice(ordinal:id).
CudaDevice(ordinal:id)
Sorry, something went wrong.
No branches or pull requests
When running a model across several processes using NCCL, the debug formatter output will print the same ID for two GPUs:
It's confusing when looking at logs and trying to figure out which GPU is doing what.
This is because candle uses an atomic counter per-PID to assign a device ID:
candle/candle-core/src/cuda_backend/device.rs
Lines 35 to 39 in 00d8a0c
Would it be a problem to include the CUDA device ordinal in the debug formatter? If not I'll open a PR.
The text was updated successfully, but these errors were encountered: