Skip to content

v2.3.5

Compare
Choose a tag to compare
@MrUnbelievable92 MrUnbelievable92 released this 20 Oct 19:01
· 38 commits to master since this release
e5427c0

Known Issues

  • half8 == and != operators don't conform to the IEEE 754 standard (compliant with Unity.Mathematics)
  • (s)byte, (u)short vector and (U)Int128 multiplication, division and modulo operations by compile time constants are not optimal
  • optimized (U)Int128 comparison operators didn't make it into this release
  • bool vectors generated from operations on non-(s)byte vectors do not generate the most optimal machine code possible, partly due to an LLVM performance regression, partly due to other compiler related difficulties
  • most vectorized function overloads don't communicate return value ranges to the compiler yet, missing out on more efficient code paths selected at compile-time-only with compile-time-only value range checks.
  • AVX2 (s)byte32 all_dif lookup tables are currently way too large (kiloBytes)

Fixes

  • (Issue #10) bool8/16/32 are now blittable when not used within an IJob

Additions

  • added comb(n, k) for scalar- and vector integer types. This is known as the binomial coefficient or "n choose k". An optional Promise parameter can select a O(1) code path using the factorial formula, whereas the standard approach, which cannot ever overflow unless the result itself overflows (which is not true for most solutions found online that claim it), uses a O(min(k, n - k)) algorithm with respect to time
  • added perm(n, k) for scalar- and vector integer types. This is known as "k-permutations of n". An optional Promise parameter can select a O(1) code path using the factorial formula, whereas the standard approach, which cannot ever overflow unless the result itself overflows, uses a O(k) algorithm with respect to time
  • added nextgreater(x) for all types. For integer types, it is a wrapper function for addsaturated(x, 1). For floating point types, it returns the next greater representable floating point value(s), unless x is NaN or infinite. An optional Promise parameter allows for numerous optimizations.
  • added nextsmaller(x) for all types. For integer types, it is a wrapper function for subsaturated(x, 1). For floating point types, it returns the next smaller representable floating point value(s), unless x is NaN or infinite. An optional Promise parameter allows for numerous optimizations
  • added nexttoward(from, to) for all types, returning the next representable integer/floating point value(s) in a given direction, unless from is equal to to. For floating point types, from is returned if from is NaN or infinite. If to is NaN, NaN is returned. An optional Promise parameter allows for numerous optimizations.

Improvements

  • improved performance of 64bit vectorized division thanks to a newly implemented and further optimized algorithm from a July 13th 2022 research paper, which replaces a vectorized loop (rather slow; up to 64 iterations; no instruction level parallelism outside the loop possible until the loop finished executing, following an almost certainly mispredicted branch) with straight line code. Due to "recent" improvements to divider circuits, this code path is inferior to hardware supported scalar division via element extraction for (u)long2, specifically, even when the quotient and/or remainder vector is in the middle of a dependency chain and even in tight loops, and is thus only implemented for (u)long3/4 types and only if compiling for AVX2
  • improved performance and reduced code size of up to (s)byte8 and every (u)short vector division if not compiling with FloatMode.Fast. Reduced constants possibly read from RAM in either case.
  • fixed performance regression of SIMD register <-> software abstraction conversions for types using up the entirety of a hardware register
  • lcm for (s)byte vectors with 8 elements or less: decreased code size by 20 or 28 bytes; removed 2 or 4 or 8 bytes of constant data read from RAM; reduced latency by 2 or 3 clock cycles
  • verified and increased the (u)long scalar- and vector intcbrt Promise.Unsafe0 range from [0, 1ul << 40] to [0, 1ul << 46], the code path of which is also possibly chosen at compile time
  • implemented optimized quarter{X} IEEE-754 comparison operators (without having to cast to float{X}). Vectorized halfX comparisons are implemented in MaxMath.Intrinsics.Xse as well and used where appropriate. compareto with quarter{X} and half{X} function overloads were implemented.
  • reduced latency of add/subsaturated for scalar Int128s, scalar and vector longs as well as vector ints by about a third
  • replaced (U)Int128.ToString(null, null)s call to BigInteger.ToString() and thus unnecessary heap allocations with an optimized implementation
  • (u)short8 / and % operators now correctly check for SSE2 support rather than AVX2
  • removed aliased fixed size buffers from all types, also improving indexer operator performance if the index is a compile time constant (in some cases)

Changes

  • Burst compiled code that uses a Promise argument which is not a compile time constant will throw an exception in DEBUG, as it represents significant overhead instead of an optimization. This will currently not inform users of the name of the function but rather the Burst compiled job/function that threw it.

Fixed Oversights

  • added explicit type conversion operators for scalar floats and doubles to half8 and all quarter vectors (as well as scalar halfs to quarter vectors)