-
Notifications
You must be signed in to change notification settings - Fork 10
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
### Known Issues - half8 "equals" and "not equals" operators don't conform to the IEEE 754 standard - Unity has not yet reacted to my bug-report in regards to their "half" implementation ### Fixes - fixed triggered burst compilation error by "Sse4_1.blend_epi16" when compiling for SSE2 due to fallback code not using a constant value for "imm8" - fixed incorrect CPU feature checks for quarter vector type-conversion code when compiling for SSE2 - fixed "tzcnt" implementations (were completely broken) - fixed scalar (single value and C# fallback) "lzcnt" implementations for (s)byte and (u)short values and (u)long4 vectors ### Additions - added "ulong countbits(void* ptr, ulong bytes)", which counts the number of 1-bits in a given block of memory, using Wojciech Mula's SIMD population count algorithm - added high performance and/or SIMD "gcd" a.k.a. greatest common divisor functions for (u)int, (u)long and all integer vector types, which always return unsgined types and vectors - added high performance and/or SIMD "lcm" a.k.a. least common multiple functions for (u)int, (u)long and all integer vector types, which always return unsgined types and vectors - added high performance and/or SIMD "intsqrt" - integer square root (floor(sqrt(x)) functions for all integer- and integer vector types, with the functions for signed integers throwing an ArgumentOutOfRangeException in case a value is negative ### Improvements - performance improvements of "avg" functions for signed integer vectors - added SIMD implementations of the "transpose" functions for all matrix types - added SSE4 and SSE2 fallback code for variable bitshifts ("shl", "shrl" and "shra") - added SSE2 fallback code for (s)byte vector-by-vector division and modulo operations - added SSE2 fallback code for "all_dif" for (s)byte16, (u)short8 and (u)int8 vectors - added SSE2 fallback code for typecasting, propagating through the entire library - added SSE2 fallback code for "addsub" and "subadd" functions - bitmask32 and bitmask64 now allow for masks to be up to 32 and 64 bits wide, respectively ### Changes - renamed "BurstCompilerException" to "CPUFeatureCheckException" - "shl", "shrl" and "shra" now have undefined behavior when bitshifting any value beyond [0, 8 * sizeof(integer_type)] for performance reasons and because of differences between SSE, AVX and managed C# ### Fixed Oversights - added "shl", "shrl" and "shra" functions for (s)byte and (u)short vectors - added "ror" and "rol" (varying per element) functions for (s)byte and (u)short vectors - added "compareto" functions for all vector types except half- and quarter vectors - added "all_dif" functions for (s)byte32 vectors - added vshr/l and vror/l functions for (s)byte32 and (u)short16 vectors
- Loading branch information
1 parent
2970a15
commit a6b5c64
Showing
142 changed files
with
19,088 additions
and
6,549 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.