-
Notifications
You must be signed in to change notification settings - Fork 253
Release Notes
Here we draft the release notes for the next release.
Note: format is [summary] [commit hash or PR#] [author(s)]
Use the release notes helper script
to generate the preliminary list. Then group the changes and review the descriptions and look out for ????
Mostly the first line of the commit line is a good summary, but please think through each entry and (re)write a summary that helps users quickly determine if this change would be interesting/useful to them. For example, include the name of the intrinsic/function in the summary so that users don't have to click through each commit themselves.
Complete set of implementations for all NEON intrinsics have been finished! (@yyctw @wewe5215 SIMDe PRs are tested using Fedora Rawhide (@junaruga)
There are a total of 6876 SIMD functions on x86, 2937 (42.71%) of which have been implemented in SIMDe so far. Specifically for AVX-512, of the 5270 functions currently in AVX-512, SIMDe implements 1516 (28.77%).
- AES: 5 of 6 (83.33%)
- cvtus_storeu: 1 of 18 (5.56%) implemented.
- fpclass: 3 of 24 (12.50%) implemented.
- i32gather: 1 of 8 (12.50%) implemented.
- i64gather: 8 of 8 💯 kand permutex rcp reduce
- AVX512BW: 6 additional, 337 total of 790 (42.66%)
- AVX512DQ: 5 additional, 112 total of 376 (29.79%)
- AVX512F: 48 additional, 1087 total of 2824 (38.49%)
- AVX512_FP16: 15 additional, 17 total of 1105 (1.54%)
SIMDe currently implements 6670 out of 6670 (100.00%) NEON functions; up from 56.46% in the previous release!
- abal
- abal_high
- abd
- abdh
- abdl_high
- addhn_high
- aes
- bfdot
- bfdot_lane
- cadd_rot
- cale
- calt
- cmla_lane
- cmla_rot_lane
- copy_lane
- cvt_high
- cvt_n
- cvta
- cvtn
- cvtp
- cvtx
- cvtx_high
- div
- dupb_lane
- duph_lane
- eor3
- fmlal
- fms
- fms_lane
- fms_n
- ld2_dup
- ld2_lane
- ld3_dup
- ld3_lane
- ld4_dup
- maxnmv
- minnmv
- mla_lane
- mla_high_lane
- mls_lane
- mlsl_high_lane
- mmla
- mull_high_lane
- mull_high_n
- mulx
- mulx_lane
- pmaxnm
- pminnm
- qdmlal
- qdmlal_high
- qdmlal_high_lane
- qdmlal_high_n
- qdmlal_lane
- qdmlal_n
- qdmlsl
- qdmlsl_high
- qdmlsl_high_lane
- qdmlsl_high_n
- qdmlsl_lane
- qdmlsl_n
- qdmlslh
- qdmlslh_lane
- qdmulhh
- qdmulhh_lane
- qdmull_high
- qdmull_high_lane
- qdmull_high_n
- qdmull_lane
- qdmull_n
- qdmullh_lane
- qmovun_high
- qrdmlah
- qrdmlah_lane
- qrdmlahh
- qrdmlahh_lane
- qrdmlsh
- qrdmlsh_lane
- qrdmlshh
- qrdmlshh_lane
- qrdmulhh_lane
- qrshl
- qrshlh
- qrshrn_high_n
- qrshrnh_n
- qrshrun_high_n
- qrshrunh_n
- qshl_n
- qshlh_n
- qshluh_n
- qshrn_high_n
- qshrnh_n
- qshrun_high_n
- qshrunh_n
- raddhn
- raddhn_high
- rax
- recp
- rnd32x
- rnd32x
- rnd32x
- rnd64z
- rnda
- rndx
- rshrn_high_n
- rsubhn
- rsubhn
- set_lane
- sha1
- sha1h
- sha256
- sha512
- shll_high_n
- shrn_high_n
- sli_n
- sm3
- sm4
- sqrt
- st1_x2
- st1_x3
- st1_x4
- st1q_x2
- st1q_x3
- st1q_x4
- subhn_high
- sudot_lane
- usdot
- usdot_lane
Finally complete families
- cvtn
- mla_lane
simde-f16: improve _Float16 usage; better INFHF/NANHF defs 8910057 @mr-c simde_float16: prefer __fp16 if available aba26f6 @mr-c
cvtn: vcvtnq_{s32_f32,s64_f64}: add SSE & AVX512 optimized implementations e134cc7 @mr-c cvtn: vcvtnq_u32_f32 is a V8 function 8432c70 @mr-c min: Remove non-working MMX specialization from simde_vmin_s16 6858b92 @M-HT shll: Extend constant range in simde_vshll_n_XXX intrinsics (#1064) beb1c61 @M-HT various: Implement some f16XN types and f16 related intrinsics. (#1071) aae2245 @yyctw qtbl/qtbx polyfills for A32V7 a2fef9e @easyaspi314 arm: use SIMDE_ARCH_ARM_FMA 7198d6d @mr-c arm neon: Complex operations from Armv8.3-a (#1077) d08d67c @wewe5215 more fp16 using intrinsics supported by architecture v7 (skip version) (#1081) 5e7c4d4 @yyctw st1{,q}_*_x{2,3,4}: initial implementation (#1082) 879d1a0 @yyctw part 1 of implement all intrinsics supported by architecture A64 (#1090) 2eedece @yyctw Add AES instructions. 23adcd2 805ccd2 @yyctw Modified simde_float16 to simde_float16_t (#1100) 8a05dc6 @yyctw implement all intrinsics supported by architecture A64-remaining part (#1093) 018ba24 @yyctw add enable vmlaq_laneq_f32 and vcvtq_n_f64_u64 c7d314b @yyctw implement all bf16-related intrinsics (#1110) c59db7c @yyctw
simd128: fix altivec_p7 version of wasm_f64x2_pmin 96d6e53 @mr-c simd128: add missing unsigned functions ea5e283 @mr-c simd128 f{32x4,64x2}_min: add workaround for a gcc<6 issue d5d6d10 @mr-c
sse{,2,4.1}, avx{,2} *stream{,load}: use _builtin_nontemporal{load,store} 6ce6030 @mr-
sse: Fix issues related to MXCSR register (#1060) 653aba8 @M-HT sse: implement _mm_movelh_ps for Arm64 514564e @mr-c sse _mm_movemask_ps: remove unused code fba97e4 @mr-c sse2 mm_pause: more archs, add a basic test 692a2e8 @mr- sse4.1: use logical OR instead of bitwise OR in neon impl of _mm_testnzc_si128 edd4678 @mr-c sse4.1 mm_testz_si128: fix backwards short circuit logic f132275 @mr-c
run test from #926 ce9708c @mr-c simde_mm256_shuffle_pd fix for natural vector size < 128 1594d7c @mr-c
fpclass: naive implementation 353bf5f @mr-c loadu: fix native detection 305f434 @mr-c set: add simde_x_mm512_set_m256{,d} 67e0c50 @mr-c gather: add MSVC native fallbacks 7b7e3f6 @mr-c
AVX512FP16 / m512h initial support e97691c @mr-c fix many native aliases 75014b9 @mr-c
fix natives, some require VPCLMULQDQ f819c52 @mr-c
enable SIMDE_X86_SVML_NATIVE for MSVC 2019+ 593af95 @mr-c
aes: initial implementation of most aes instructions (#1072) 8632391 @Vineg
msa neon impl: float64x2_t is not avail in A32V7 ae4c4ab @mr-c
fix SIMDE_ARCH_X86_SSE4_2 define 5e4b308 @cbielow
x86 aes: add neon implementation using the crypto extension fb3554f @mr-
neon/st1: disable last remaining AltiVec implementation 0521245 @mr-c
sse2,wasm simd128: skip SIMDE_CONVERT_VECTOR_ impementations on PowerPC 4de999a @mr-c wasm simd128: more powerpc fixes 7cb5691 @mr-c
GCC AVX512F: SIMDE_BUG_GCC_95399 was fixed in GCC 9.5, 10.4, 11.4, 12+ 3fa89c5 @mr-c
GCC x86/x64: SIMDE_BUG_GCC_98521 was fixed in 10.3 edde42e @mr-c
GCC x86: SIMDE_BUG_GCC_94482 was fixed in 8.5, 9.4, 10+ 43d86a3 @mr-c
Add workaround for GCC bug 111609 fdafd8e @M-HT
arm neon ld2: silence warnings at -O3 on gcc risc-v 8f56628 @mr-c
clang powerpc: vec_bperm bug was fixed in clang-14 6feb28a @mr-c clmul: aarch64 clang has difficulties with poly64x1_t 1e1bd76 @mr-c clang aarch64: optimization bug 45541 was fixed in clang-15 7ca5712 @mr-c A32V7: Don't trust clang for load multiple on A32V7 927f141 @easyaspi314 clang wasm: SIMDE_BUG_CLANG_60655 is fixed in the upcoming 17.0 release 25cebbe @mr-c
fp16: don't use _Float16 on ClangCL if not supported 8a6b8c5 @mr-c svml: don't enable SIMDE_X86_SVML_NATIVE for ClangCl c877fe5 @mr-
avx512 types: avoid using native AVX512 types on MSVC unless required 029d749 @mr-c
Update recipe for qemu git mode 54b8c8f @mr-c riscv64 gcc: typo fix for endian little 7423339 @mr-c add new cross sets; Ubuntu Focal and Bionic support b0b9710 @mr-c native tests: also AVX512, MSA; fix WASM SIMD128 path bdd075b @mr-c test-flags: support the x86 microarchitecture levels 518b777 @mr-c
preserve test log 9815161 @mr-c
save meson log on error 5207d83 @mr-
circleci: clang, set -Wno-unsafe-buffer-usage 24c93c2 @mr-c
upgrade qemu ; fixes remaining ppc64el fails! e91944b @mr-c
tidy matrix ordering for easier to read job names b52ac36 @mr-c
add clang-qemu: aarch64, riscv64, ppc64el, s390x 8a6dbab @mr-c
test armv7 with gcc-12 via qemu 8cd8de1 @mr-c
add armel to gcc and clang qemu matrices 4ca849b @mr-c
add armv7 to clang-qemu matrix a144aca @mr-c
use GCC 12 for adv x64 native testing + AVX512FP f156b41 @mr-c
expand mac-os/xcode testing matrix 8055410 @mr-c
fix macos-13+brew failure c6149de @mr-c
test with clang-16 e25ced8 @mr-c
add gcc-13 43ac8fc @mr-c
run on commits to the primary branch to prime the cache 6055bfb @mr-c
Start testing SIMDe PRs using Fedora Rawhide d64b103 6ae0763 b309d89 4d55fc2 643c419 @junaruga
restart testing with Travis CI 93905f5 @mr-c simplify x86 ISA matrix 6b7c1b3 @mr-c
README: mark F16C as complete 2d87cf5 @mr-c README: Give credit to creator/maintainer of the vcpkg for SIMDe ceb1e73 @mr-c README: related projects: add AvxToNeon 13bf92a @mr-c
Template for next time
# Summary
## [X86](https://github.com/simd-everywhere/implementation-status/blob/main/x86.md)
### Newly added function families
### Additions to existing families
## [Neon](https://github.com/simd-everywhere/implementation-status/blob/main/neon.md)
## [MSA](https://github.com/simd-everywhere/implementation-status/blob/main/msa.md)
# Details
## Implementation of Arm intrinsics
### NEON
### SVE Intrinsics
## WASM intrinsics
## x86 intrinsics
### SSE*
### AVX
### AVX2
### AVX512
### GFNI
### XOP
### F16C
### FMA
### SVML
## MIPS MSA intrinics
## Arch support
### arm64
### z/Arch
### Altivec
### e2k (Elbrus)
### Power
## Testing with Docker/Podman & CI
### [Appveyor](https://ci.appveyor.com/project/nemequ/simde/history)
### [Azure](https://dev.azure.com/simd-everywhere/SIMDe/_build?definitionId=3)
### [Circle CI](https://app.circleci.com/pipelines/github/simd-everywhere/simde)
### [Cirrus CI](https://cirrus-ci.com/github/simd-everywhere/simde)
### [Local testing with Docker/Podman](https://github.com/simd-everywhere/simde/tree/master/docker#readme)
### [Drone.io](https://cloud.drone.io/simd-everywhere/simde)
### [GitHub Actions](https://github.com/simd-everywhere/simde/actions)
### [Netlify](https://app.netlify.com/sites/simde/)
### [Packit CI](https://dashboard.packit.dev/projects/github.com/simd-everywhere/simde)
### [Semaphore CI](https://nemequ.semaphoreci.com/projects/simde)
### [Travis](https://app.travis-ci.com/github/simd-everywhere/simde)
## Misc