Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix FindBLAS.cmake compatibility with MKL 2024.0 #1794

Merged
merged 1 commit into from
Feb 27, 2024

Conversation

iskunk
Copy link
Contributor

@iskunk iskunk commented Jan 29, 2024

Description

I am building oneDNN using the Intel oneAPI compiler and MKL version 2024.0.0. I get this error at configuration time:

MKLROOT=/opt/intel/oneapi cmake \
        -DCMAKE_INSTALL_PREFIX=/opt/onednn \
        -DCMAKE_INSTALL_LIBDIR=lib \
        -DDNNL_LIBRARY_TYPE=STATIC \
        -DDNNL_BLAS_VENDOR=MKL \
        /home/src/oneDNN
-- CMAKE_BUILD_TYPE is unset, defaulting to Release
-- The C compiler identification is IntelLLVM 2024.0.0
-- The CXX compiler identification is IntelLLVM 2024.0.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /opt/intel/oneapi/compiler/2024.0/bin/icx - skipped
-- Detecting C compile features
[...]
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Found OpenMP_C: -fiopenmp  
-- Found OpenMP_CXX: -fiopenmp  
-- Found OpenMP: TRUE   
CMake Error at /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find BLAS (missing: BLAS_LIBRARIES)
Call Stack (most recent call first):
  /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:600 (_FPHSA_FAILURE_MESSAGE)
  cmake/FindBLAS.cmake:951 (find_package_handle_standard_args)
  cmake/blas.cmake:66 (find_package)
  CMakeLists.txt:126 (include)


-- Configuring incomplete, errors occurred!

I see in cmake/FindBLAS.cmake that oneDNN expects to find the MKL libraries under an mkl/lib/ directory, among other places. However, my oneAPI install has the MKL under a versioned directory:

$ find /opt/intel -name libmkl_rt.so
/opt/intel/oneapi/mkl/2024.0/lib/libmkl_rt.so
/opt/intel/oneapi/mkl/2024.0/lib32/libmkl_rt.so
/opt/intel/oneapi/2024.0/lib/libmkl_rt.so
/opt/intel/oneapi/2024.0/lib32/libmkl_rt.so

That noted, I also get the same configuration error with any of these settings:

MKLROOT=/opt/intel/oneapi/mkl
MKLROOT=/opt/intel/oneapi/mkl/2024.0
MKLROOT=/opt/intel/oneapi/2024.0

If I look in the mkl/ directory, I see that there is a handy latest symlink to avoid hard-coding the version:

$ ls -l /opt/intel/oneapi/mkl/
total 4
drwxr-xr-x 10 root root 4096 Jan 29 21:31 2024.0
lrwxrwxrwx  1 root root    6 Jan 29 21:31 latest -> 2024.0

If I add mkl/latest/lib and variations to BLAS_mkl_LIB_PATH_SUFFIXES, then configuration succeeds, and I am able to build and test oneDNN without issue.

The version of the Intel compiler/libraries I am using is fairly new, and I suspect the oneDNN build may not have been tested against it before now.

@vpirogov
Copy link
Member

vpirogov commented Feb 2, 2024

@iskunk, thanks for the report!

We stopped using Intel MKL as BLAS vendor in oneDNN v1.0 as built-in JIT implementation reached maturity. Could you please share why you want to build oneDNN with Intel MKL's BLAS?

@iskunk
Copy link
Contributor Author

iskunk commented Feb 3, 2024

Hello @vpirogov,

I am actually in the course of building PyTorch, but with all the third-party dependencies built separately (as if they were system libraries). My group is using the Intel compiler and MKL, so that is the BLAS that we've set for PyTorch. It appeared that this was also needed for oneDNN. But then I noticed this comment about MKL not being necessary, or even beneficial performance-wise with the current oneDNN codebase.

I was able to build oneDNN successfully without MKL, and am currently using that for PyTorch. But I left this PR up on the basis that if MKL remains a supported build option for the project, then this trivial issue of the MKL library location should be addressed.

An alternative would be to update the CMake code to visibly deprecate MKL usage, or even remove it altogether. The legacy nature of the option is not obvious to an outsider---not least because this project is part of the Intel ecosystem, and thus one would expect to see Intel building blocks as hard dependencies. (That the MKL has been sidelined here speaks quite well of the project's independence, I might add.)

Is there a need to retain support for building against the MKL? It seems like a good bit of confusion and complexity could be eliminated with it.

@vpirogov
Copy link
Member

vpirogov commented Feb 3, 2024

@iskunk, thanks for clarifications. Your PR does make sense, but it got me thinking why we are still providing option to use Intel MKL. I agree that currently it's confusing and we should drop it or at least mark as deprecated.

@densamoilov
Copy link
Contributor

densamoilov commented Feb 3, 2024

@iskunk, thanks for clarifications. Your PR does make sense, but it got me thinking why we are still providing option to use Intel MKL. I agree that currently it's confusing and we should drop it or at least mark as deprecated.

@vpirogov, well, technically, we document that we support only ARMPL and ACCELERATE BLAS vendors so trying to use other BLAS vendors would be an undefined behavior. One thing that we can do here is to add a check to the build system to make sure that the BLAS vendor specified by the user is one of the supported ones.

@AngryLoki
Copy link
Contributor

AngryLoki commented Feb 3, 2024

@iskunk , could you try running

/opt/intel/oneapi/mkl/latest/env/vars.sh
cmake -DDNNL_BLAS_VENDOR=MKL

This approach works in Gentoo, including MKL 2024.0 (there is no ebuild for MKL 2024.0, but I modified existing one and it worked).

Also, oneDNN still works with generic BLAS libraries. I still find it useful (to have as an option), because:

  1. sgemm routines in oneDNN are still slower than in specialized libraries
  2. linking to aocl-blas, specifically, may improve performance on AMD, unless MKL is tricked to bypass anti-amd code
  3. if application is already using BLAS for other operations, reusing gemm kernels saves compilation time and results in smaller binaries.
Collapsed block my old test, which was incorrect

Here is how performance looks for M=N=K for 3 cases on Ryzen 7950x3d:

# aocl-blas/mkl/mkl-fixed
cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, 1.0, A, k, B, n, 0.0, C, n);

# onednn
dnnl_sgemm(same parameters, alpha=1, beta=0)

# onednn with all parameters known, alpha=1, beta=0
matmul::primitive_desc{eng, {{M, K}, dt::f32, {lda, 1}}, {{K, N}, dt::f32, {ldb, 1}}, {{M, N}, dt::f32, {ldc, 1}}}

resultsx
(horizontal: M=N=K, vertical: FLOPS as 2MN*K / time, higher is better)

Here DNNL was built without MKL/BLAS (i. e. with native gemm kernels). Note this strange saw effect in dnnl_sgemm, btw. I don't know if it is an expected behavior.

UPD: I take my words about "OneDNN sgemm is slower" back! I did not follow https://oneapi-src.github.io/oneDNN/dev_guide_performance_settings.html and tested with OMP_NUM_THREADS=16 GOMP_CPU_AFFINITY=0-15 instead of recommended OMP_NUM_THREADS=16 OMP_PLACES=threads OMP_PROC_BIND=spread. With OMP_PROC_BIND=spread there is no degradation anymore and oneDNN is on-par or faster:
results2

It is again same or faster for bf16bf16f32 gemm (alpha=1, beta=0).

Collapsed, because oneDNN does not call cblas_gemm_bf16bf16f32 anyways

results3

I just want to highlight, that linking to MKL indeed is a way to shoot oneself in the leg, especially on AMD CPUs.

@iskunk
Copy link
Contributor Author

iskunk commented Feb 7, 2024

@iskunk , could you try running

Thanks for the update. I was not using the vars.sh file in any event, since it plays rather fast and loose in how it sets up the environment---all I was doing was setting MKLROOT. The FindBLAS.cmake file already pierces whatever abstraction could have been gained by relying on Intel's environment variables, so I take that to mean that even if it finds 2024.0 successfully with vars.sh, it ought to work without as well.

But that's likely a moot point, given those benchmark results 😃 Are there any valid non-performance-based reasons for wanting to use the MKL, e.g. numerical stability? (That's not my use case, but it could be someone else's)

@vpirogov
Copy link
Member

@iskunk, there are no reasons to build oneDNN with MKL as a BLAS vendor. The only reason this option exists after oneDNN v1.0 is for performance debug purposes (like @AngryLoki's analysis above).

@vpirogov vpirogov merged commit 77485a9 into oneapi-src:main Feb 27, 2024
0 of 10 checks passed
@vpirogov
Copy link
Member

Thank you for your contribution, @iskunk!

To minimize confusion about DNNL_BLAS_VENDOR I'm considering emitting a warning for options that do not bring performance benefit. Let me know if you have other ideas.

@iskunk
Copy link
Contributor Author

iskunk commented Feb 28, 2024

Thank you @vpirogov.

I would suggest making the warning for MKL a bit more prominent than others. Given this project's Intel lineage, the expectation for users going in would be for the MKL to be either required, optimal, or at least first among equals. So an unobtrusive warning on that point is more likely to be overlooked, especially if the user is in a hurry.

@vpirogov vpirogov added the enhancement A feature or an optimization request label May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A feature or an optimization request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants