[Release/2.4] Remove amax_ptr from scaled_gemm for UT test_scaled_mm_vs_emulated_float_cuda and Updating unit test case based on removing amax from _scaled_mm #1762

amd-sriram · 2024-12-02T20:35:04Z

Fixes https://github.com/ROCm/frameworks-internal/issues/8493 and https://github.com/ROCm/frameworks-internal/issues/10198

cherry pick commit - 39a6179

amax was removed from _scaled_mm by pytorch#128683. Remove it from the internal at::cuda::blas::scaled_gemm, as well. This allows hipBLASLt to find additional solutions rather than forcing amax to be used and then discarding the result.

Also removing amax comparison in the unit test.

…comparison in the unit test, removing skip rocm decorator with cherry pick of 3ea3914

okakarpa · 2024-12-02T21:30:28Z

Jenkins build for 9e9eab378eff7d6c0e6abb8d7394bb404c89d506 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Detected error during Pytorch building:

	/opt/rocm/lib/libhsa-runtime64.so.1
	/lib/x86_64-linux-gnu/libm.so.6
[7980/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_spherical_bessel_j0.hip.o
[7981/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_fused_adam_amsgrad_impl.hip.o
[7982/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/torch_hip_generated_flash_api.hip.o
FAILED: caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/torch_hip_generated_flash_api.hip.o /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/torch_hip_generated_flash_api.hip.o 
cd /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn && /opt/conda/envs/py_3.10/bin/cmake -E make_directory /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/. && /opt/conda/envs/py_3.10/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=RELEASE -D generated_file:STRING=/var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/./torch_hip_generated_flash_api.hip.o -P /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/torch_hip_generated_flash_api.hip.o.cmake
In file included from /var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/flash_attn/flash_api.hip:57:
/var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/aotriton_adapter.h:120:10: error: no matching constructor for initialization of 'aotriton::TensorView<0>'
  120 |   return aotriton::TensorView<0>(reinterpret_cast<intptr_t>(q.data_ptr()),
      |          ^                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

rocm-mici · 2024-12-03T18:19:37Z

Jenkins build for 9e9eab378eff7d6c0e6abb8d7394bb404c89d506 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Detected error during Pytorch building:

1 warning generated when compiling for gfx908.
[7946/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_ForeachBinaryOpScalarList.hip.o
[7947/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_WeightNorm.hip.o
[7948/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_fused_adam_impl.hip.o
[7949/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/torch_hip_generated_flash_api.hip.o
FAILED: caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/torch_hip_generated_flash_api.hip.o /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/torch_hip_generated_flash_api.hip.o 
cd /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn && /opt/conda/envs/py_3.10/bin/cmake -E make_directory /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/. && /opt/conda/envs/py_3.10/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=RELEASE -D generated_file:STRING=/var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/./torch_hip_generated_flash_api.hip.o -P /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/torch_hip_generated_flash_api.hip.o.cmake
In file included from /var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/flash_attn/flash_api.hip:57:
/var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/aotriton_adapter.h:120:10: error: no matching constructor for initialization of 'aotriton::TensorView<0>'
  120 |   return aotriton::TensorView<0>(reinterpret_cast<intptr_t>(q.data_ptr()),
      |          ^                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

amd-sriram added 2 commits December 2, 2024 12:57

Cherry-pick 3ea3914 without scalingtype namespace in Blas.cpp

e403672

amax was removed from _scaled_mm by pytorch#128683. So removing amax …

9e9eab3

…comparison in the unit test, removing skip rocm decorator with cherry pick of 3ea3914

This was referenced Dec 2, 2024

[Release/2.4] Updating unit test case based on removing amax from _scaled_mm and removing amax constraint to find more solutions. #1742

Closed

[Release/2.4] Remove amax_ptr from scaled_gemm for UT test_scaled_mm_vs_emulated_*float*_cuda #1735

Closed

amd-sriram requested review from jeffdaily, pruthvistony, alugorey and jataylo December 2, 2024 20:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Release/2.4] Remove amax_ptr from scaled_gemm for UT test_scaled_mm_vs_emulated_float_cuda and Updating unit test case based on removing amax from _scaled_mm #1762

[Release/2.4] Remove amax_ptr from scaled_gemm for UT test_scaled_mm_vs_emulated_float_cuda and Updating unit test case based on removing amax from _scaled_mm #1762

amd-sriram commented Dec 2, 2024

okakarpa commented Dec 2, 2024

rocm-mici commented Dec 3, 2024

[Release/2.4] Remove amax_ptr from scaled_gemm for UT test_scaled_mm_vs_emulated_*float*_cuda and Updating unit test case based on removing amax from _scaled_mm #1762

Are you sure you want to change the base?

[Release/2.4] Remove amax_ptr from scaled_gemm for UT test_scaled_mm_vs_emulated_*float*_cuda and Updating unit test case based on removing amax from _scaled_mm #1762

Conversation

amd-sriram commented Dec 2, 2024

okakarpa commented Dec 2, 2024

rocm-mici commented Dec 3, 2024

[Release/2.4] Remove amax_ptr from scaled_gemm for UT test_scaled_mm_vs_emulated_float_cuda and Updating unit test case based on removing amax from _scaled_mm #1762

[Release/2.4] Remove amax_ptr from scaled_gemm for UT test_scaled_mm_vs_emulated_float_cuda and Updating unit test case based on removing amax from _scaled_mm #1762