Skip to content

MaxMath v2.2.0

Compare
Choose a tag to compare
@MrUnbelievable92 MrUnbelievable92 released this 13 Sep 00:18

Known Issues

  • half8 == and != operators don't conform to the IEEE 754 standard - Unity has not yet reacted to my bug-report in regards to their "half" implementation
  • (s)byte, (u)short vector and (U)Int128 multiplication, division and modulo operations by compile time constants are not optimal. For (U)Int128, it requires a new Burst feature à la T Constant.ForceCompileTimeEvaluation<T, U>(Func<U, T> code)(proposed); Currently work is being done on (s)byte and (u)short vectors in this regard, which will beat any compiler. The current (tested) state of all optimizations possible is included in this version.
  • pow functions with compile time constant exponents currently do not handle many decimal numbers - math.rsqrt would often be used in those cases for optimal performance but it is actually slower when the Unity.Burst.FloatMode is set to anything but FloatMode.Fast. To guarantee optimal performance, compile time access to the current FloatMode would be needed (proposed)
  • double (r)cbrt functions are currently not optimized

Fixes

  • linked float8 rcp and rsqrt functions to Bursts' FloatMode and FloatPrecision
  • short.MinValue / -1 now correctly overflows to short.MinValue when dividing a short16 vector by another short16 vector when compiling for AVX or higher
  • fixed scalar quarter to double conversion for when the quarter value is negative
  • fixed scalar half to quarter conversion for when the half value is negative
  • fixed vector quarter to ulong conversion for when a quarter value is negative
  • fixed (u)short8 to quarter8 conversion

Additions

Added saturation arithmetic to the library for all scalar- and vector types. Saturation arithmetic clamps the result of an operation to type.MinValue and type.MaxValue if under- or overflow occurs, respectively and has single-instruction hardware support for (s)bytes and (u)shorts. The included functions are:

  • addsaturated
  • subsaturated
  • mulsaturated
  • divsaturated (only clamps division of floating point types and signed division of, for instance, sbyte.MinValue ( = -128) / -1 to sbyte.MaxValue ( =127), which would cause a hardware exception for ints and longs`)
  • castsaturated (all types to all other types with a smaller range),
  • csumsaturated
  • cprodsaturated

(U)Int128

  • added high performance (U)Int128 types with full library support, meaning: all operators and type conversions aswell as all functions support these types. Most operations of both types, in Burst code, compile down to optimal machine code. Exceptions: 1) signed 64x64 bit to 128 bit multiplication 2) *, /, % and divrem functions with a scalar compile time constant argument (See: Known Issues 2)
  • added Random128 XOR-Shift pseudo random number generator for generating (U)Int128s

Cube Root

  • added high performance & accuracy (r)cbrt - (reciprocal) cube root functions for scalar and vector float- and double types based on a research paper from 2021. An optional bool parameter allows the caller to decide whether or not negative input values should be handled correctly (which is not the case with math.pow(x, 1f/3f)), which is set to false by default
  • added high performance intcbrt - integer cube root functions for all scalar and vector integer types. For signed integer types, an optional bool parameter allows the caller to decide whether or not negative input values should be handled correctly (which is not the case with math.pow(x, 1f/3f)), which is set to false by default

Other Additions

  • added a log function to all scalar and vector float- and double types with a second parameter b, which is the logarithms' base
  • added reversebytes functions for all scalar- and vector types, which convert back and forth between big endian and little endian byte order, respectively. All of them (scalar, vector) compile down to single hardware instructions
  • added pow functions with scalar exponents for float and double scalars and vectors, with optimizations for selected constant exponents (not necessarily whole exponents)
  • added function overloads to all functions for scalar (s)bytes and (u)shorts in order to resolve function call resolution ambiguity which was already present in Unity.Mathematics, which may also improve performance in some cases
  • added a static readonly New property to RandomX XOR-Shift pseudo random generators. It calls Environment.TickCount internally (and is thus seeded somewhat randomly), makes sure it is non-zero and can be called from Burst native code
  • added fastrcp functions for float scalars and vectors, faster (and substantially less accurate) than FloatPrecision.Low, FloatMode.Fast Burst implementations
  • added fastrsqrt functions for float scalars and vectors, faster (and substantially less accurate) than FloatPrecision.Low, FloatMode.Fast Burst implementations

Improvements

  • added AVX and AVX2 code for float8 sin, cos, tan, sincos, asin, acos, atan, atan2, sinh, cosh, tanh, pow, exp, exp2, exp10, log, log2, log10 and fmod (and the % operator)
  • optimized many /, %, * and divrem operations with a scalar compile time constant argument for (s)byte vectors (see 'Known Issues 2'), which were previously not optimized (...optimally/at all) by Burst.
  • added SSE2 fallback code for converting AVX vector types to SSE vector types and vice versa(for example: short16(256 bit) to byte16(128 bit))
  • scalar (s)byte and (u)short rol and ror functions now compile down to single hardware instructions
  • improved performance and/or reduced code size of nearly all vector comparison operations (==, > etc.)
  • improved performance of - and added SSE2 fallback code for bitfield to boolean vector conversion (toboolX and thus also select(vector a, vector b, bitmask c));
  • improved performance of intpow functions in general and for when the exponent is a compile time constant
  • improved performance and reduced code size of compareto vector functions (especially for unsigned types)
  • added more optimizations to isdivisible
  • improved performance of intsqrt functions for (u)long and (s)byte scalar and vector types considerably
  • reduced code size of ispow2 vector functions
  • reduced code size of (s)byte vector-by-vector division
  • improved performance of Random64's (u)long4 generation if compiling for AVX2
  • improved performance of (s)byte matrix multiplication
  • reduced code size of (u)short- and up to (s)byte8 vector by vector division and divrem functions(and improved performance if compiling for SSE2 only)
  • reduced code size and improved performance of isinrange functions for (u)long vector types
  • reduced code size of ushort vector >= and <= operators for SSE2 fallback code by ~75%
  • improved performance and reduced code size of SSE2 down-casting fallback code

Changes

  • API BREAKING CHANGE: The various boolean to integer/floating point conversion functions (touint8/tof32 etc.) are now renamed to contain C# types in their names (tobyte/tofloat etc.)
  • API BREAKING CHANGE: If you use this library as intended, meaning you import it and Unity.Mathematics.math statically (using static MaxMath.maxmath;) and you use the pow functions with scalar bases and scalar exponents in those scripts, you will encounter the first ever function call resolution ambiguity. It is strongly recommended to always use the maxmath.pow function, because it optimizes any pow call enormously if the exponent is a compile time constant, which does NOT necessarily mean that such a call must declare the exponent as a literal value - the exponent may become a compile time constant due to constant propagation
  • quarter is now a readonly struct
  • quarter to sbyte, short, int and long coversions are now required to be declared explicitly
  • removed countbits(void* ptr, ulong bytes) from the library and added it to https://github.com/MrUnbelievable92/SIMD-Algorithms with more options

Fixed Oversights

  • (Issue #3) added constructor wrappers to the maxmath class analogous to Unity.Mathematics(byte4 myByte4 = (maxmath.)byte4(1, 2, 3, 4);)
  • added dsub - fused divide-subtract function for scalar and vector float types
  • added an optional bool fast = false parameter to dad, dsub, dadsub and dsubadd functions
  • added andnot function overloads for scalar and vector bool types
  • added implicit type conversions of scalar quarter values to half, float and double vectors
  • added all_eq and all_dif functions for vectors of size 2
  • added all_eq and all_dif functions for float and double vectors