MaxMath v2.1.1
Known Issues
- half8 "equals" and "not equals" operators don't conform to the IEEE 754 standard - Unity has not yet reacted to my bug-report in regards to their "half" implementation
Fixes
- fixed triggered burst compilation error by "Sse4_1.blend_epi16" when compiling for SSE2 due to fallback code not using a constant value for "imm8"
- fixed incorrect CPU feature checks for quarter vector type-conversion code when compiling for SSE2
- fixed "tzcnt" implementations (were completely broken)
- fixed scalar (single value and C# fallback) "lzcnt" implementations for (s)byte and (u)short values and (u)long4 vectors
Additions
- added "ulong countbits(void* ptr, ulong bytes)", which counts the number of 1-bits in a given block of memory, using Wojciech Mula's SIMD population count algorithm
- added high performance and/or SIMD "gcd" a.k.a. greatest common divisor functions for (u)int, (u)long and all integer vector types, which always return unsigned types and vectors
- added high performance and/or SIMD "lcm" a.k.a. least common multiple functions for (u)int, (u)long and all integer vector types, which always return unsigned types and vectors
- added high performance and/or SIMD "intsqrt" - integer square root (floor(sqrt(x)) functions for all integer- and integer vector types, with the functions for signed integers and vectors throwing an ArgumentOutOfRangeException in case a value is negative
Improvements
- performance improvements of "avg" functions for signed integer vectors
- added SIMD implementations of the "transpose" functions for all matrix types
- added SSE4 and SSE2 fallback code for variable bitshifts ("shl", "shrl" and "shra")
- added SSE2 fallback code for (s)byte vector-by-vector division and modulo operations
- added SSE2 fallback code for "all_dif" for (s)byte16, (u)short8 and (u)int8 vectors
- added SSE2 fallback code for typecasting, propagating through the entire library
- added SSE2 fallback code for "addsub" and "subadd" functions
- bitmask32 and bitmask64 now allow for masks to be up to 32 and 64 bits wide, respectively
Changes
- renamed "BurstCompilerException" to "CPUFeatureCheckException"
- "shl", "shrl" and "shra" now have undefined behavior when bitshifting any value outside of the interval [0, 8 * sizeof(integer_type) - 1] for performance reasons and because of differences between SSE, AVX and managed C#
Fixed Oversights
- added "shl", "shrl" and "shra" (varying per element) functions for (s)byte and (u)short vectors
- added "ror" and "rol" (varying per element) functions for (s)byte and (u)short vectors
- added "compareto" functions for all vector types except half- and quarter vectors
- added "all_dif" functions for (s)byte32 vectors
- added vshr/l and vror/l functions for (s)byte32 and (u)short16 vectors
2.1.1 Hotfix
Fixes
- fixed SSE2 "shl", "shrl" and "shra" implementations
- fixed SSE2 "intsqrt" implementations
Improvements
- improved performance of (s)byte2, -3, -4, -8, -16 and (u)short2, -3, -4, -8 "gcd" functions (and thus "lcm") when compiling for Avx2
- improved performance of "tzcnt" and "lzcnt" implementations for all vector types if compiling for SSE4 or higher, propagating through a lot of the library
Fixed Oversights
Added documentation for RandomX methods