Skip to content

MaxMath v2.1.1

Compare
Choose a tag to compare
@MrUnbelievable92 MrUnbelievable92 released this 01 Mar 00:42

Known Issues

  • half8 "equals" and "not equals" operators don't conform to the IEEE 754 standard - Unity has not yet reacted to my bug-report in regards to their "half" implementation

Fixes

  • fixed triggered burst compilation error by "Sse4_1.blend_epi16" when compiling for SSE2 due to fallback code not using a constant value for "imm8"
  • fixed incorrect CPU feature checks for quarter vector type-conversion code when compiling for SSE2
  • fixed "tzcnt" implementations (were completely broken)
  • fixed scalar (single value and C# fallback) "lzcnt" implementations for (s)byte and (u)short values and (u)long4 vectors

Additions

  • added "ulong countbits(void* ptr, ulong bytes)", which counts the number of 1-bits in a given block of memory, using Wojciech Mula's SIMD population count algorithm
  • added high performance and/or SIMD "gcd" a.k.a. greatest common divisor functions for (u)int, (u)long and all integer vector types, which always return unsigned types and vectors
  • added high performance and/or SIMD "lcm" a.k.a. least common multiple functions for (u)int, (u)long and all integer vector types, which always return unsigned types and vectors
  • added high performance and/or SIMD "intsqrt" - integer square root (floor(sqrt(x)) functions for all integer- and integer vector types, with the functions for signed integers and vectors throwing an ArgumentOutOfRangeException in case a value is negative

Improvements

  • performance improvements of "avg" functions for signed integer vectors
  • added SIMD implementations of the "transpose" functions for all matrix types
  • added SSE4 and SSE2 fallback code for variable bitshifts ("shl", "shrl" and "shra")
  • added SSE2 fallback code for (s)byte vector-by-vector division and modulo operations
  • added SSE2 fallback code for "all_dif" for (s)byte16, (u)short8 and (u)int8 vectors
  • added SSE2 fallback code for typecasting, propagating through the entire library
  • added SSE2 fallback code for "addsub" and "subadd" functions
  • bitmask32 and bitmask64 now allow for masks to be up to 32 and 64 bits wide, respectively

Changes

  • renamed "BurstCompilerException" to "CPUFeatureCheckException"
  • "shl", "shrl" and "shra" now have undefined behavior when bitshifting any value outside of the interval [0, 8 * sizeof(integer_type) - 1] for performance reasons and because of differences between SSE, AVX and managed C#

Fixed Oversights

  • added "shl", "shrl" and "shra" (varying per element) functions for (s)byte and (u)short vectors
  • added "ror" and "rol" (varying per element) functions for (s)byte and (u)short vectors
  • added "compareto" functions for all vector types except half- and quarter vectors
  • added "all_dif" functions for (s)byte32 vectors
  • added vshr/l and vror/l functions for (s)byte32 and (u)short16 vectors

2.1.1 Hotfix

Fixes

  • fixed SSE2 "shl", "shrl" and "shra" implementations
  • fixed SSE2 "intsqrt" implementations

Improvements

  • improved performance of (s)byte2, -3, -4, -8, -16 and (u)short2, -3, -4, -8 "gcd" functions (and thus "lcm") when compiling for Avx2
  • improved performance of "tzcnt" and "lzcnt" implementations for all vector types if compiling for SSE4 or higher, propagating through a lot of the library

Fixed Oversights

Added documentation for RandomX methods