Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[ROCm] Fix fp32 atomicAdd for non-MI100 GPUs (pytorch#128750)
Current implementation is very specific to MI100. This is causing performance degradation for other GPUs. Fixes pytorch#128631 Benchmarking on MI300X: ``` Before: 1918.5126953125 ms After: 0.8285150527954102 ms ``` Co-authored-by: Jeff Daily <[email protected]> Pull Request resolved: pytorch#128750 Approved by: https://github.com/xw285cornell (cherry picked from commit 1f0a68b)
- Loading branch information