Skip to content
Michael R. Crusoe edited this page Nov 20, 2023 · 37 revisions

Here we draft the release notes for the next release.

Note: format is [summary] [commit hash or PR#] [author(s)]

Use the release notes helper script to generate the preliminary list. Then group the changes and review the descriptions and look out for ????

Mostly the first line of the commit line is a good summary, but please think through each entry and (re)write a summary that helps users quickly determine if this change would be interesting/useful to them. For example, include the name of the intrinsic/function in the summary so that users don't have to click through each commit themselves.

SIMDe 0.8

Summary

Complete set of implementations for all NEON intrinsics have been finished! (@yyctw @wewe5215 SIMDe PRs are tested using Fedora Rawhide (@junaruga)

There are a total of 6876 SIMD functions on x86, 2937 (42.71%) of which have been implemented in SIMDe so far. Specifically for AVX-512, of the 5270 functions currently in AVX-512, SIMDe implements 1516 (28.77%).

Newly added function families

  • AES: 5 of 6 (83.33%)

Newly AVX512 added function families

Additions to existing families

  • AVX512BW: 6 additional, 337 total of 790 (42.66%)
  • AVX512DQ: 5 additional, 112 total of 376 (29.79%)
  • AVX512F: 48 additional, 1087 total of 2824 (38.49%)
  • AVX512_FP16: 15 additional, 17 total of 1105 (1.54%)

SIMDe currently implements 6670 out of 6670 (100.00%) NEON functions; up from 56.46% in the previous release!

Newly added families

  • abal
  • abal_high
  • abd
  • abdh
  • abdl_high
  • addhn_high
  • aes
  • bfdot
  • bfdot_lane
  • cadd_rot
  • cale
  • calt
  • cmla_lane
  • cmla_rot_lane
  • copy_lane
  • cvt_high
  • cvt_n
  • cvta
  • cvtn
  • cvtp
  • cvtx
  • cvtx_high
  • div
  • dupb_lane
  • duph_lane
  • eor3
  • fmlal
  • fms
  • fms_lane
  • fms_n
  • ld2_dup
  • ld2_lane
  • ld3_dup
  • ld3_lane
  • ld4_dup
  • maxnmv
  • minnmv
  • mla_lane
  • mla_high_lane
  • mls_lane
  • mlsl_high_lane
  • mmla
  • mull_high_lane
  • mull_high_n
  • mulx
  • mulx_lane
  • pmaxnm
  • pminnm
  • qdmlal
  • qdmlal_high
  • qdmlal_high_lane
  • qdmlal_high_n
  • qdmlal_lane
  • qdmlal_n
  • qdmlsl
  • qdmlsl_high
  • qdmlsl_high_lane
  • qdmlsl_high_n
  • qdmlsl_lane
  • qdmlsl_n
  • qdmlslh
  • qdmlslh_lane
  • qdmulhh
  • qdmulhh_lane
  • qdmull_high
  • qdmull_high_lane
  • qdmull_high_n
  • qdmull_lane
  • qdmull_n
  • qdmullh_lane
  • qmovun_high
  • qrdmlah
  • qrdmlah_lane
  • qrdmlahh
  • qrdmlahh_lane
  • qrdmlsh
  • qrdmlsh_lane
  • qrdmlshh
  • qrdmlshh_lane
  • qrdmulhh_lane
  • qrshl
  • qrshlh
  • qrshrn_high_n
  • qrshrnh_n
  • qrshrun_high_n
  • qrshrunh_n
  • qshl_n
  • qshlh_n
  • qshluh_n
  • qshrn_high_n
  • qshrnh_n
  • qshrun_high_n
  • qshrunh_n
  • raddhn
  • raddhn_high
  • rax
  • recp
  • rnd32x
  • rnd32x
  • rnd32x
  • rnd64z
  • rnda
  • rndx
  • rshrn_high_n
  • rsubhn
  • rsubhn
  • set_lane
  • sha1
  • sha1h
  • sha256
  • sha512
  • shll_high_n
  • shrn_high_n
  • sli_n
  • sm3
  • sm4
  • sqrt
  • st1_x2
  • st1_x3
  • st1_x4
  • st1q_x2
  • st1q_x3
  • st1q_x4
  • subhn_high
  • sudot_lane
  • usdot
  • usdot_lane

Finally complete families

  • cvtn
  • mla_lane

Details

simde-f16: improve _Float16 usage; better INFHF/NANHF defs 8910057 @mr-c simde_float16: prefer __fp16 if available aba26f6 @mr-c

Implementation of Arm intrinsics

NEON

cvtn: vcvtnq_{s32_f32,s64_f64}: add SSE & AVX512 optimized implementations e134cc7 @mr-c cvtn: vcvtnq_u32_f32 is a V8 function 8432c70 @mr-c min: Remove non-working MMX specialization from simde_vmin_s16 6858b92 @M-HT shll: Extend constant range in simde_vshll_n_XXX intrinsics (#1064) beb1c61 @M-HT various: Implement some f16XN types and f16 related intrinsics. (#1071) aae2245 @yyctw qtbl/qtbx polyfills for A32V7 a2fef9e @easyaspi314 arm: use SIMDE_ARCH_ARM_FMA 7198d6d @mr-c arm neon: Complex operations from Armv8.3-a (#1077) d08d67c @wewe5215 more fp16 using intrinsics supported by architecture v7 (skip version) (#1081) 5e7c4d4 @yyctw st1{,q}_*_x{2,3,4}: initial implementation (#1082) 879d1a0 @yyctw part 1 of implement all intrinsics supported by architecture A64 (#1090) 2eedece @yyctw Add AES instructions. 23adcd2 805ccd2 @yyctw Modified simde_float16 to simde_float16_t (#1100) 8a05dc6 @yyctw implement all intrinsics supported by architecture A64-remaining part (#1093) 018ba24 @yyctw add enable vmlaq_laneq_f32 and vcvtq_n_f64_u64 c7d314b @yyctw implement all bf16-related intrinsics (#1110) c59db7c @yyctw

SVE Intrinsics

WASM intrinsics

simd128: fix altivec_p7 version of wasm_f64x2_pmin 96d6e53 @mr-c simd128: add missing unsigned functions ea5e283 @mr-c simd128 f{32x4,64x2}_min: add workaround for a gcc<6 issue d5d6d10 @mr-c

x86 intrinsics

sse{,2,4.1}, avx{,2} *stream{,load}: use _builtin_nontemporal{load,store} 6ce6030 @mr-

SSE*

sse: Fix issues related to MXCSR register (#1060) 653aba8 @M-HT sse: implement _mm_movelh_ps for Arm64 514564e @mr-c sse _mm_movemask_ps: remove unused code fba97e4 @mr-c sse2 mm_pause: more archs, add a basic test 692a2e8 @mr- sse4.1: use logical OR instead of bitwise OR in neon impl of _mm_testnzc_si128 edd4678 @mr-c sse4.1 mm_testz_si128: fix backwards short circuit logic f132275 @mr-c

AVX

run test from #926 ce9708c @mr-c simde_mm256_shuffle_pd fix for natural vector size < 128 1594d7c @mr-c

AVX2

AVX512

fpclass: naive implementation 353bf5f @mr-c loadu: fix native detection 305f434 @mr-c set: add simde_x_mm512_set_m256{,d} 67e0c50 @mr-c gather: add MSVC native fallbacks 7b7e3f6 @mr-c

AVX512FP16 / m512h initial support e97691c @mr-c fix many native aliases 75014b9 @mr-c

CLMUL

fix natives, some require VPCLMULQDQ f819c52 @mr-c

GFNI

XOP

F16C

FMA

SVML

enable SIMDE_X86_SVML_NATIVE for MSVC 2019+ 593af95 @mr-c

AES

aes: initial implementation of most aes instructions (#1072) 8632391 @Vineg

MIPS MSA intrinics

msa neon impl: float64x2_t is not avail in A32V7 ae4c4ab @mr-c

Arch support

x86(-64)

fix SIMDE_ARCH_X86_SSE4_2 define 5e4b308 @cbielow

arm64

x86 aes: add neon implementation using the crypto extension fb3554f @mr-

z/Arch

Altivec

neon/st1: disable last remaining AltiVec implementation 0521245 @mr-c

e2k (Elbrus)

Power

sse2,wasm simd128: skip SIMDE_CONVERT_VECTOR_ impementations on PowerPC 4de999a @mr-c wasm simd128: more powerpc fixes 7cb5691 @mr-c

Compiler Specific

GCC

GCC AVX512F: SIMDE_BUG_GCC_95399 was fixed in GCC 9.5, 10.4, 11.4, 12+ 3fa89c5 @mr-c GCC x86/x64: SIMDE_BUG_GCC_98521 was fixed in 10.3 edde42e @mr-c
GCC x86: SIMDE_BUG_GCC_94482 was fixed in 8.5, 9.4, 10+ 43d86a3 @mr-c Add workaround for GCC bug 111609 fdafd8e @M-HT arm neon ld2: silence warnings at -O3 on gcc risc-v 8f56628 @mr-c

Clang

clang powerpc: vec_bperm bug was fixed in clang-14 6feb28a @mr-c clmul: aarch64 clang has difficulties with poly64x1_t 1e1bd76 @mr-c clang aarch64: optimization bug 45541 was fixed in clang-15 7ca5712 @mr-c A32V7: Don't trust clang for load multiple on A32V7 927f141 @easyaspi314 clang wasm: SIMDE_BUG_CLANG_60655 is fixed in the upcoming 17.0 release 25cebbe @mr-c

ClangCL

fp16: don't use _Float16 on ClangCL if not supported 8a6b8c5 @mr-c svml: don't enable SIMDE_X86_SVML_NATIVE for ClangCl c877fe5 @mr-

MSVC

avx512 types: avoid using native AVX512 types on MSVC unless required 029d749 @mr-c

Testing with Docker/Podman & CI

Update recipe for qemu git mode 54b8c8f @mr-c riscv64 gcc: typo fix for endian little 7423339 @mr-c add new cross sets; Ubuntu Focal and Bionic support b0b9710 @mr-c native tests: also AVX512, MSA; fix WASM SIMD128 path bdd075b @mr-c test-flags: support the x86 microarchitecture levels 518b777 @mr-c

preserve test log 9815161 @mr-c
save meson log on error 5207d83 @mr-

circleci: clang, set -Wno-unsafe-buffer-usage 24c93c2 @mr-c

upgrade qemu ; fixes remaining ppc64el fails! e91944b @mr-c
tidy matrix ordering for easier to read job names b52ac36 @mr-c add clang-qemu: aarch64, riscv64, ppc64el, s390x 8a6dbab @mr-c test armv7 with gcc-12 via qemu 8cd8de1 @mr-c add armel to gcc and clang qemu matrices 4ca849b @mr-c
add armv7 to clang-qemu matrix a144aca @mr-c use GCC 12 for adv x64 native testing + AVX512FP f156b41 @mr-c expand mac-os/xcode testing matrix 8055410 @mr-c fix macos-13+brew failure c6149de @mr-c test with clang-16 e25ced8 @mr-c
add gcc-13 43ac8fc @mr-c run on commits to the primary branch to prime the cache 6055bfb @mr-c

Start testing SIMDe PRs using Fedora Rawhide d64b103 6ae0763 b309d89 4d55fc2 643c419 @junaruga

restart testing with Travis CI 93905f5 @mr-c simplify x86 ISA matrix 6b7c1b3 @mr-c

Misc

README: mark F16C as complete 2d87cf5 @mr-c README: Give credit to creator/maintainer of the vcpkg for SIMDe ceb1e73 @mr-c README: related projects: add AvxToNeon 13bf92a @mr-c

Template for next time

# Summary
## [X86](https://github.com/simd-everywhere/implementation-status/blob/main/x86.md)
### Newly added function families
### Additions to existing families
## [Neon](https://github.com/simd-everywhere/implementation-status/blob/main/neon.md)
## [MSA](https://github.com/simd-everywhere/implementation-status/blob/main/msa.md)
# Details
## Implementation of Arm intrinsics
### NEON
### SVE Intrinsics
## WASM intrinsics
## x86 intrinsics
### SSE*
### AVX
### AVX2
### AVX512
### GFNI 
### XOP
### F16C
### FMA
### SVML
## MIPS MSA intrinics
## Arch support
### arm64
### z/Arch
### Altivec
### e2k (Elbrus)
### Power
## Testing with Docker/Podman & CI
### [Appveyor](https://ci.appveyor.com/project/nemequ/simde/history)
### [Azure](https://dev.azure.com/simd-everywhere/SIMDe/_build?definitionId=3)
### [Circle CI](https://app.circleci.com/pipelines/github/simd-everywhere/simde)
### [Cirrus CI](https://cirrus-ci.com/github/simd-everywhere/simde)
### [Local testing with Docker/Podman](https://github.com/simd-everywhere/simde/tree/master/docker#readme)
### [Drone.io](https://cloud.drone.io/simd-everywhere/simde)
### [GitHub Actions](https://github.com/simd-everywhere/simde/actions)
### [Netlify](https://app.netlify.com/sites/simde/)
### [Packit CI](https://dashboard.packit.dev/projects/github.com/simd-everywhere/simde)
### [Semaphore CI](https://nemequ.semaphoreci.com/projects/simde)
### [Travis](https://app.travis-ci.com/github/simd-everywhere/simde)
## Misc
Clone this wiki locally