[Release/2.4] Increase the precision from float32 to float64 for test related to linear algebra #1748

amd-sriram · 2024-11-26T08:36:16Z

This is skipped on Nvidia, fails when made to force run on Nvidia.

Both amd and nvidia gpus -
/opt/conda/lib/python3.11/site-packages/torch/autograd/graph.py:769: UserWarning: There is a performance drop because we have not yet implemented the batching rule for aten::tril_. Please file us an issue on GitHub so that we can prioritize its implementation. (Triggered internally at ../aten/src/ATen/functorch/BatchedFallback.cpp:81.)
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass

To make it more precise:
dtype = torch.float64

This is a temporary fix and the test should be revisited when batching rule is implemented for aten::tril_.

Running the computation in float64 (as NumPy does by default) improves the precision to pass the test. Reference - https://pytorch.org/docs/stable/notes/numerical_accuracy.html#linear-algebra-stability

…s the precision to pass the test. Reference - https://pytorch.org/docs/stable/notes/numerical_accuracy.html#linear-algebra-stability

okakarpa · 2024-11-26T09:58:35Z

Jenkins build for 95706699a3f7bd12c72bb7abd29d9237f48dcb93 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Detected error during Pytorch building:

	/lib/x86_64-linux-gnu/libm.so.6
[7979/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_BinaryMiscOpsKernels.hip.o
[7980/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_spherical_bessel_j0.hip.o
[7981/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_shifted_chebyshev_polynomial_w.hip.o
[7982/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention_backward.hip.o
FAILED: caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention_backward.hip.o /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention_backward.hip.o 
cd /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip && /opt/conda/envs/py_3.10/bin/cmake -E make_directory /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/. && /opt/conda/envs/py_3.10/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=RELEASE -D generated_file:STRING=/var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/./torch_hip_generated_attention_backward.hip.o -P /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention_backward.hip.o.cmake
In file included from /var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/attention_backward.hip:49:
/var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/aotriton_adapter.h:120:10: error: no matching constructor for initialization of 'aotriton::TensorView<0>'
  120 |   return aotriton::TensorView<0>(reinterpret_cast<intptr_t>(q.data_ptr()),
      |          ^                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

okakarpa · 2024-11-27T18:29:52Z

Jenkins build for 95706699a3f7bd12c72bb7abd29d9237f48dcb93 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Detected error during Pytorch building:

	/var/lib/jenkins/pytorch/build/lib/libshm.so
	/opt/rocm/lib/libhsa-runtime64.so.1
	/lib/x86_64-linux-gnu/libm.so.6
[7981/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_spherical_bessel_j0.hip.o
[7982/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention_backward.hip.o
FAILED: caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention_backward.hip.o /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention_backward.hip.o 
cd /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip && /opt/conda/envs/py_3.10/bin/cmake -E make_directory /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/. && /opt/conda/envs/py_3.10/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=RELEASE -D generated_file:STRING=/var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/./torch_hip_generated_attention_backward.hip.o -P /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention_backward.hip.o.cmake
In file included from /var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/attention_backward.hip:49:
/var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/aotriton_adapter.h:120:10: error: no matching constructor for initialization of 'aotriton::TensorView<0>'
  120 |   return aotriton::TensorView<0>(reinterpret_cast<intptr_t>(q.data_ptr()),
      |          ^                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

rocm-mici · 2024-12-03T17:58:09Z

Jenkins build for 95706699a3f7bd12c72bb7abd29d9237f48dcb93 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Detected error during Pytorch building:

[7943/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_shifted_chebyshev_polynomial_u.hip.o
[7944/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_shifted_chebyshev_polynomial_t.hip.o
[7945/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_shifted_chebyshev_polynomial_v.hip.o
[7946/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_scaled_modified_bessel_k1.hip.o
[7947/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention_backward.hip.o
FAILED: caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention_backward.hip.o /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention_backward.hip.o 
cd /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip && /opt/conda/envs/py_3.10/bin/cmake -E make_directory /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/. && /opt/conda/envs/py_3.10/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=RELEASE -D generated_file:STRING=/var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/./torch_hip_generated_attention_backward.hip.o -P /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention_backward.hip.o.cmake
In file included from /var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/attention_backward.hip:49:
/var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/aotriton_adapter.h:120:10: error: no matching constructor for initialization of 'aotriton::TensorView<0>'
  120 |   return aotriton::TensorView<0>(reinterpret_cast<intptr_t>(q.data_ptr()),
      |          ^                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Running the computation in float64 (as NumPy does by default) improve…

9570669

…s the precision to pass the test. Reference - https://pytorch.org/docs/stable/notes/numerical_accuracy.html#linear-algebra-stability

amd-sriram requested review from pruthvistony and jataylo November 26, 2024 08:58

amd-sriram self-assigned this Nov 26, 2024

amd-sriram changed the title ~~Increase the precision from float32 to float64 for test related to linear algebra~~ [Release/2.4] Increase the precision from float32 to float64 for test related to linear algebra Nov 26, 2024

amd-sriram marked this pull request as draft November 26, 2024 17:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Release/2.4] Increase the precision from float32 to float64 for test related to linear algebra #1748

[Release/2.4] Increase the precision from float32 to float64 for test related to linear algebra #1748

amd-sriram commented Nov 26, 2024

okakarpa commented Nov 26, 2024

okakarpa commented Nov 27, 2024

rocm-mici commented Dec 3, 2024

[Release/2.4] Increase the precision from float32 to float64 for test related to linear algebra #1748

Are you sure you want to change the base?

[Release/2.4] Increase the precision from float32 to float64 for test related to linear algebra #1748

Conversation

amd-sriram commented Nov 26, 2024

okakarpa commented Nov 26, 2024

okakarpa commented Nov 27, 2024

rocm-mici commented Dec 3, 2024