-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding new HWY_AVX10_2 target #2348
Comments
Thanks for starting the discussion! Looks like GNR has also just been introduced/launched, but that supports 10.1, I think. Min/MaxNumber (Min with proper NaN handling per IEEE754:2019) and Min/MaxMagnitude look useful, as does F16 WidenMulPairwiseAdd. Would be very happy to see those added :) I agree we'd want to split the "AVX3" and "512-bit" aspects of x86_512-inl.h. How about I make a TODO for around 2025-03 to lay the groundwork by creating the HWY_AVX10_2 (or HWY_AVX102?) target/boilerplate? Would you later like to add some of its functionality? |
MinMagnitude/MaxMagnitude ops are implemented in pull request #2353. |
It is possible to go ahead and implement the HWY_AVX10_2 target as GCC 14, Clang 18, and Clang 19 have the |
The upcoming Intel AVX10.2 instruction set (which is described in the specification that can be found at https://www.intel.com/content/www/us/en/content-details/828965/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html) adds the following operations:
IfThenElse(Lt(Abs(a), Abs(b)), a, b)
if botha[i]
andb[i]
are non-NaN)IfThenElse(Lt(Abs(a), Abs(b)), b, a)
if botha[i]
andb[i]
are non-NaN)GCC 15 and Clang 20, which are currently under development and scheduled to be released in Spring 2025, will have support for the new AVX10.2 intrinsics.
The new _mm*_cvttsp[h,s,d]_epi* intrinsics available on AVX10.2 should also fix the undefined behavior that is there with the conversion of out-of-range floating-point vectors to integer vectors with GCC (and this issue was described at #2183).
Also need to move some of the ops for 256-bit or smaller vectors that are currently implemented in the hwy/ops/x86_512-inl.h header on AVX3 targets into a separate header as support for 512-bit vectors is optional on AVX10.2.
The text was updated successfully, but these errors were encountered: