-
Notifications
You must be signed in to change notification settings - Fork 10
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
## Known Issues - `half8` `==` and `!=` operators don't conform to the IEEE 754 standard (compliant with Unity.Mathematics) - `(s)byte`, `(u)short` vector and `(U)Int128` multiplication, division and modulo operations by compile time constants are not optimal - optimized `(U)Int128` comparison operators didn't make it into this release - `bool` vectors generated from operations on non-`(s)byte` vectors do not generate the most optimal machine code possible, partly due to an LLVM performance regression, partly due to other compiler related difficulties - most vectorized function overloads don't communicate return value ranges to the compiler yet, missing out on more efficient code paths selected at compile-time-only with compile-time-only value range checks. - AVX2 `(s)byte32` `all_dif` lookup tables are currently way too large (kiloBytes) ## Fixes - (Issue 10) `bool8/16/32` are now blittable when not used within an `IJob` ## Additions - added `comb(n, k)` for scalar- and vector integer types. This is known as the binomial coefficient or "n choose k". An optional `Promise` parameter can select a O(1) code path using the factorial formula, whereas the standard approach, which cannot ever overflow unless the result itself overflows (which is not true for most solutions found online that claim it), uses a O(min(k, n - k)) algorithm with respect to time - added `perm(n, k)` for scalar- and vector integer types. This is known as "k-permutations of n". An optional `Promise` parameter can select a O(1) code path using the factorial formula, whereas the standard approach, which cannot ever overflow unless the result itself overflows, uses a O(k) algorithm with respect to time - added `nextgreater(x)` for all types. For integer types, it is a wrapper function for `addsaturated(x, 1)`. For floating point types, it returns the next greater representable floating point value(s), unless x is NaN or infinite. An optional `Promise` parameter allows for numerous optimizations. - added `nextsmaller(x)` for all types. For integer types, it is a wrapper function for `subsaturated(x, 1)`. For floating point types, it returns the next smaller representable floating point value(s), unless x is NaN or infinite. An optional `Promise` parameter allows for numerous optimizations. , added `nexttoward(from, to)` for all types, returning the next representable integer/floating point value(s) in a given direction, unless `from` is equal to `to`. For floating point types, `from` is returned if `from` is NaN or infinite. If `to` is NaN, NaN is returned. An optional `Promise` parameter allows for numerous optimizations. ## Improvements - improved performance of 64bit vectorized division thanks to a newly implemented and further optimized algorithm from [https://hal.archives-ouvertes.fr/hal-03722203/document](a July 13th 2022 research paper), which replaces a vectorized loop (rather slow; up to 64 iterations; no instruction level parallelism outside the loop possible until the loop finished executing, following an almost certainly mispredicted branch) with straight line code. Due to "recent" improvements to divider circuits, this code path is inferior to hardware supported scalar division via element extraction for `(u)long2`, specifically, even when the quotient and/or remainder vector is in the middle of a dependency chain and even in tight loops, and is thus only implemented for `(u)long3/4` types and only if compiling for AVX2 - improved performance and reduced code size of up to `(s)byte8` and every `(u)short` vector division if _not_ compiling with `FloatMode.Fast`. Reduced constants _possibly_ read from RAM in either case. - fixed performance regression of SIMD register <-> software abstraction conversions for types using up the entirety of a hardware register - `lcm` for `(s)byte` vectors with 8 elements or less: decreased code size by 20 or 28 bytes; removed 2 or 4 or 8 bytes of constant data read from RAM; reduced latency by 2 or 3 clock cycles - verified and increased the `(u)long` scalar- and vector `intcbrt` `Promise.Unsafe0` range from [0, 1ul << 40] to [0, 1ul << 46], the code path of which is also possibly chosen at compile time - implemented optimized `quarter{X}` IEEE-754 comparison operators (without having to cast to `float{X}`). Vectorized `halfX` comparisons are implemented in `MaxMath.Intrinsics.Xse` aswell and used where appropriate. `compareto` with `quarter{X}` and `half{X}` function overloads were implemented. - reduced latency of `add/subsaturated` for scalar `Int128`s, scalar and vector `long`s aswell as vector `int`s by about a third - replaced `(U)Int128.ToString(null, null)`s call to `BigInteger.ToString()` and thus unnecessary heap allocations with an optimized implementation - `(u)short8` `/` and `%` operators now correctly check for SSE2 support rather than AVX2 - removed aliased fixed size buffers from all types, also improving indexer operator performance if the index is a compile time constant (in some cases) ## Changes - Burst compiled code that uses a `Promise` argument which is _not_ _a_ _compile_ _time_ _constant_ will throw an exception in `DEBUG`, as it represents significant overhead instead of an optimization. This will currently not inform users of the name of the function but rather the Burst compiled job/function that threw it. ## Fixed Oversights - added `explicit` type conversion operators for scalar `float`s and `double`s to `half8` and all `quarter` vectors (aswell as scalar `half`s to `quarter` vectors)
- Loading branch information
1 parent
981f38f
commit d3aa390
Showing
376 changed files
with
38,508 additions
and
9,423 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.