Skip to content

Commit

Permalink
v2.3.5
Browse files Browse the repository at this point in the history
## Known Issues

- `half8` `==` and `!=` operators don't conform to the IEEE 754 standard (compliant with Unity.Mathematics)
- `(s)byte`, `(u)short` vector and `(U)Int128` multiplication, division and modulo operations by compile time constants are not optimal
- optimized `(U)Int128` comparison operators didn't make it into this release
- `bool` vectors generated from operations on non-`(s)byte` vectors do not generate the most optimal machine code possible, partly due to an LLVM performance regression, partly due to other compiler related difficulties
- most vectorized function overloads don't communicate return value ranges to the compiler yet, missing out on more efficient code paths selected at compile-time-only with compile-time-only value range checks.
- AVX2 `(s)byte32` `all_dif` lookup tables are currently way too large (kiloBytes)

## Fixes

- (Issue 10) `bool8/16/32` are now blittable when not used within an `IJob`

## Additions

- added `comb(n, k)` for scalar- and vector integer types. This is known as the binomial coefficient or "n choose k". An optional `Promise` parameter can select a O(1) code path using the factorial formula, whereas the standard approach, which cannot ever overflow unless the result itself overflows (which is not true for most solutions found online that claim it), uses a O(min(k, n - k)) algorithm with respect to time
- added `perm(n, k)` for scalar- and vector integer types. This is known as "k-permutations of n". An optional `Promise` parameter can select a O(1) code path using the factorial formula, whereas the standard approach, which cannot ever overflow unless the result itself overflows, uses a O(k) algorithm with respect to time
- added `nextgreater(x)` for all types. For integer types, it is a wrapper function for `addsaturated(x, 1)`. For floating point types, it returns the next greater representable floating point value(s), unless x is NaN or infinite. An optional `Promise` parameter allows for numerous optimizations.
- added `nextsmaller(x)` for all types. For integer types, it is a wrapper function for `subsaturated(x, 1)`. For floating point types, it returns the next smaller representable floating point value(s), unless x is NaN or infinite. An optional `Promise` parameter allows for numerous optimizations.
, added `nexttoward(from, to)` for all types, returning the next representable integer/floating point value(s) in a given direction, unless `from` is equal to `to`. For floating point types, `from` is returned if `from` is NaN or infinite. If `to` is NaN, NaN is returned. An optional `Promise` parameter allows for numerous optimizations.

## Improvements

- improved performance of 64bit vectorized division thanks to a newly implemented and further optimized algorithm from [https://hal.archives-ouvertes.fr/hal-03722203/document](a July 13th 2022 research paper), which replaces a vectorized loop (rather slow; up to 64 iterations; no instruction level parallelism outside the loop possible until the loop finished executing, following an almost certainly mispredicted branch) with straight line code. Due to "recent" improvements to divider circuits, this code path is inferior to hardware supported scalar division via element extraction for `(u)long2`, specifically, even when the quotient and/or remainder vector is in the middle of a dependency chain and even in tight loops, and is thus only implemented for `(u)long3/4` types and only if compiling for AVX2
- improved performance and reduced code size of up to `(s)byte8` and every `(u)short` vector division if _not_ compiling with `FloatMode.Fast`. Reduced constants _possibly_ read from RAM in either case.
- fixed performance regression of SIMD register <-> software abstraction conversions for types using up the entirety of a hardware register
- `lcm` for `(s)byte` vectors with 8 elements or less: decreased code size by 20 or 28 bytes; removed 2 or 4 or 8 bytes of constant data read from RAM; reduced latency by 2 or 3 clock cycles
- verified and increased the `(u)long` scalar- and vector `intcbrt` `Promise.Unsafe0` range from [0, 1ul << 40] to [0, 1ul << 46], the code path of which is also possibly chosen at compile time
- implemented optimized `quarter{X}` IEEE-754 comparison operators (without having to cast to `float{X}`). Vectorized `halfX` comparisons are implemented in `MaxMath.Intrinsics.Xse` aswell and used where appropriate. `compareto` with `quarter{X}` and `half{X}` function overloads were implemented.
- reduced latency of `add/subsaturated` for scalar `Int128`s, scalar and vector `long`s aswell as vector `int`s by about a third
- replaced `(U)Int128.ToString(null, null)`s call to `BigInteger.ToString()` and thus unnecessary heap allocations with an optimized implementation
- `(u)short8` `/` and `%` operators now correctly check for SSE2 support rather than AVX2
- removed aliased fixed size buffers from all types, also improving indexer operator performance if the index is a compile time constant (in some cases)

## Changes

- Burst compiled code that uses a `Promise` argument which is _not_ _a_ _compile_ _time_ _constant_ will throw an exception in `DEBUG`, as it represents significant overhead instead of an optimization. This will currently not inform users of the name of the function but rather the Burst compiled job/function that threw it.

## Fixed Oversights

- added `explicit` type conversion operators for scalar `float`s and `double`s to `half8` and all `quarter` vectors (aswell as scalar `half`s to `quarter` vectors)
  • Loading branch information
MrUnbelievable92 committed Oct 20, 2022
1 parent 981f38f commit d3aa390
Show file tree
Hide file tree
Showing 376 changed files with 38,508 additions and 9,423 deletions.
6 changes: 3 additions & 3 deletions Runtime/AssemblyInfo.cs
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,9 @@
// Build Number
// Revision
//
[assembly: AssemblyVersion("2.3.0")]
[assembly: AssemblyFileVersion("2.3.0")]
[assembly: AssemblyInformationalVersion("2.3 Release")]
[assembly: AssemblyVersion("2.3.5")]
[assembly: AssemblyFileVersion("2.3.5")]
[assembly: AssemblyInformationalVersion("2.3.5 Release")]

[assembly: SuppressMessage("Style", "IDE1006:Naming Styles", Justification = "Unity.Mathematics API consistency")]
[assembly: CompilationRelaxationsAttribute(CompilationRelaxations.NoStringInterning)]
6 changes: 6 additions & 0 deletions Runtime/Math Lib/Constants.cs
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@ unsafe public static partial class maxmath
/// <summary> The square root 3. Approximately 1.73. This is a f64/double precision constant. </summary>
public const double SQRT3_DBL = 1.73205080756887729352d;

/// <summary> The square root 5. Approximately 2.23. This is a f64/double precision constant. </summary>
public const double SQRT5_DBL = 2.23606797749978969640d;

/// <summary> The cube root of 2. Approximately 1.26. This is a f64/double precision constant. </summary>
public const double CBRT2_DBL = 1.25992104989487316476d;

Expand All @@ -27,6 +30,9 @@ unsafe public static partial class maxmath
/// <summary> The square root of 3. Approximately 1.73. </summary>
public const float SQRT3 = 1.73205080f;

/// <summary> The square root of 5. Approximately 2.23. </summary>
public const float SQRT5 = 2.23606797f;

/// <summary> The cube root of 2. Approximately 1.26. </summary>
public const float CBRT2 = 1.25992104f;

Expand Down
18 changes: 9 additions & 9 deletions Runtime/Math Lib/Functions/Arithmetic/Add-Subtract.cs
Original file line number Diff line number Diff line change
Expand Up @@ -264,7 +264,7 @@ public static float2 addsub(float2 a, float2 b)
{
if (Sse.IsSseSupported)
{
return RegisterConversion.ToType<float2>(Xse.subadd_ps(RegisterConversion.ToV128(a), RegisterConversion.ToV128(b)));
return RegisterConversion.ToFloat2(Xse.subadd_ps(RegisterConversion.ToV128(a), RegisterConversion.ToV128(b)));
}
else
{
Expand All @@ -278,7 +278,7 @@ public static float3 addsub(float3 a, float3 b)
{
if (Sse.IsSseSupported)
{
return RegisterConversion.ToType<float3>(Xse.subadd_ps(RegisterConversion.ToV128(a), RegisterConversion.ToV128(b)));
return RegisterConversion.ToFloat3(Xse.subadd_ps(RegisterConversion.ToV128(a), RegisterConversion.ToV128(b)));
}
else
{
Expand All @@ -292,7 +292,7 @@ public static float4 addsub(float4 a, float4 b)
{
if (Sse.IsSseSupported)
{
return RegisterConversion.ToType<float4>(Xse.subadd_ps(RegisterConversion.ToV128(a), RegisterConversion.ToV128(b)));
return RegisterConversion.ToFloat4(Xse.subadd_ps(RegisterConversion.ToV128(a), RegisterConversion.ToV128(b)));
}
else
{
Expand Down Expand Up @@ -321,7 +321,7 @@ public static double2 addsub(double2 a, double2 b)
{
if (Sse2.IsSse2Supported)
{
return RegisterConversion.ToType<double2>(Xse.subadd_pd(RegisterConversion.ToV128(a), RegisterConversion.ToV128(b)));
return RegisterConversion.ToDouble2(Xse.subadd_pd(RegisterConversion.ToV128(a), RegisterConversion.ToV128(b)));
}
else
{
Expand All @@ -335,7 +335,7 @@ public static double3 addsub(double3 a, double3 b)
{
if (Avx.IsAvxSupported)
{
return RegisterConversion.ToType<double3>(Xse.mm256_subadd_pd(RegisterConversion.ToV256(a), RegisterConversion.ToV256(b)));
return RegisterConversion.ToDouble3(Xse.mm256_subadd_pd(RegisterConversion.ToV256(a), RegisterConversion.ToV256(b)));
}
else
{
Expand All @@ -349,7 +349,7 @@ public static double4 addsub(double4 a, double4 b)
{
if (Avx.IsAvxSupported)
{
return RegisterConversion.ToType<double4>(Xse.mm256_subadd_pd(RegisterConversion.ToV256(a), RegisterConversion.ToV256(b)));
return RegisterConversion.ToDouble4(Xse.mm256_subadd_pd(RegisterConversion.ToV256(a), RegisterConversion.ToV256(b)));
}
else
{
Expand Down Expand Up @@ -599,7 +599,7 @@ public static uint2 addsub(uint2 a, uint2 b)
{
if (Sse2.IsSse2Supported)
{
return RegisterConversion.ToType<uint2>(Xse.subadd_epi32(RegisterConversion.ToV128(a), RegisterConversion.ToV128(b), 2));
return RegisterConversion.ToUInt2(Xse.subadd_epi32(RegisterConversion.ToV128(a), RegisterConversion.ToV128(b), 2));
}
else
{
Expand All @@ -613,7 +613,7 @@ public static uint3 addsub(uint3 a, uint3 b)
{
if (Sse2.IsSse2Supported)
{
return RegisterConversion.ToType<uint3>(Xse.subadd_epi32(RegisterConversion.ToV128(a), RegisterConversion.ToV128(b), 3));
return RegisterConversion.ToUInt3(Xse.subadd_epi32(RegisterConversion.ToV128(a), RegisterConversion.ToV128(b), 3));
}
else
{
Expand All @@ -627,7 +627,7 @@ public static uint4 addsub(uint4 a, uint4 b)
{
if (Sse2.IsSse2Supported)
{
return RegisterConversion.ToType<uint4>(Xse.subadd_epi32(RegisterConversion.ToV128(a), RegisterConversion.ToV128(b), 4));
return RegisterConversion.ToUInt4(Xse.subadd_epi32(RegisterConversion.ToV128(a), RegisterConversion.ToV128(b), 4));
}
else
{
Expand Down
12 changes: 6 additions & 6 deletions Runtime/Math Lib/Functions/Arithmetic/Average.cs
Original file line number Diff line number Diff line change
Expand Up @@ -811,7 +811,7 @@ public static uint2 avg(uint2 x, uint2 y, Promise noOverflow = Promise.Nothing)
{
if (Sse2.IsSse2Supported)
{
return RegisterConversion.ToType<uint2>(Xse.avg_epu32(RegisterConversion.ToV128(x), RegisterConversion.ToV128(y), noOverflow.Promises(Promise.NoOverflow)));
return RegisterConversion.ToUInt2(Xse.avg_epu32(RegisterConversion.ToV128(x), RegisterConversion.ToV128(y), noOverflow.Promises(Promise.NoOverflow)));
}
else
{
Expand All @@ -833,7 +833,7 @@ public static uint3 avg(uint3 x, uint3 y, Promise noOverflow = Promise.Nothing)
{
if (Sse2.IsSse2Supported)
{
return RegisterConversion.ToType<uint3>(Xse.avg_epu32(RegisterConversion.ToV128(x), RegisterConversion.ToV128(y), noOverflow.Promises(Promise.NoOverflow)));
return RegisterConversion.ToUInt3(Xse.avg_epu32(RegisterConversion.ToV128(x), RegisterConversion.ToV128(y), noOverflow.Promises(Promise.NoOverflow)));
}
else
{
Expand All @@ -855,7 +855,7 @@ public static uint4 avg(uint4 x, uint4 y, Promise noOverflow = Promise.Nothing)
{
if (Sse2.IsSse2Supported)
{
return RegisterConversion.ToType<uint4>(Xse.avg_epu32(RegisterConversion.ToV128(x), RegisterConversion.ToV128(y), noOverflow.Promises(Promise.NoOverflow)));
return RegisterConversion.ToUInt4(Xse.avg_epu32(RegisterConversion.ToV128(x), RegisterConversion.ToV128(y), noOverflow.Promises(Promise.NoOverflow)));
}
else
{
Expand Down Expand Up @@ -902,7 +902,7 @@ public static int2 avg(int2 x, int2 y, Promise noOverflow = Promise.Nothing)
{
if (Sse2.IsSse2Supported)
{
return RegisterConversion.ToType<int2>(Xse.avg_epi32(RegisterConversion.ToV128(x), RegisterConversion.ToV128(y), noOverflow.Promises(Promise.NoOverflow), 2));
return RegisterConversion.ToInt2(Xse.avg_epi32(RegisterConversion.ToV128(x), RegisterConversion.ToV128(y), noOverflow.Promises(Promise.NoOverflow), 2));
}
else
{
Expand All @@ -917,7 +917,7 @@ public static int3 avg(int3 x, int3 y, Promise noOverflow = Promise.Nothing)
{
if (Sse2.IsSse2Supported)
{
return RegisterConversion.ToType<int3>(Xse.avg_epi32(RegisterConversion.ToV128(x), RegisterConversion.ToV128(y), noOverflow.Promises(Promise.NoOverflow), 3));
return RegisterConversion.ToInt3(Xse.avg_epi32(RegisterConversion.ToV128(x), RegisterConversion.ToV128(y), noOverflow.Promises(Promise.NoOverflow), 3));
}
else
{
Expand All @@ -932,7 +932,7 @@ public static int4 avg(int4 x, int4 y, Promise noOverflow = Promise.Nothing)
{
if (Sse2.IsSse2Supported)
{
return RegisterConversion.ToType<int4>(Xse.avg_epi32(RegisterConversion.ToV128(x), RegisterConversion.ToV128(y), noOverflow.Promises(Promise.NoOverflow), 4));
return RegisterConversion.ToInt4(Xse.avg_epi32(RegisterConversion.ToV128(x), RegisterConversion.ToV128(y), noOverflow.Promises(Promise.NoOverflow), 4));
}
else
{
Expand Down
24 changes: 12 additions & 12 deletions Runtime/Math Lib/Functions/Arithmetic/Divide With Remainder.cs
Original file line number Diff line number Diff line change
Expand Up @@ -1192,8 +1192,8 @@ public static int2 divrem(int2 dividend, int2 divisor, out int2 remainder)
{
if (Sse2.IsSse2Supported)
{
int2 ret = RegisterConversion.ToType<int2>(Xse.divrem_epi32(RegisterConversion.ToV128(dividend), RegisterConversion.ToV128(divisor), out v128 rem, 2));
remainder = RegisterConversion.ToType<int2>(rem);
int2 ret = RegisterConversion.ToInt2(Xse.divrem_epi32(RegisterConversion.ToV128(dividend), RegisterConversion.ToV128(divisor), out v128 rem, 2));
remainder = RegisterConversion.ToInt2(rem);

return ret;
}
Expand All @@ -1211,8 +1211,8 @@ public static int3 divrem(int3 dividend, int3 divisor, out int3 remainder)
{
if (Sse2.IsSse2Supported)
{
int3 ret = RegisterConversion.ToType<int3>(Xse.divrem_epi32(RegisterConversion.ToV128(dividend), RegisterConversion.ToV128(divisor), out v128 rem, 3));
remainder = RegisterConversion.ToType<int3>(rem);
int3 ret = RegisterConversion.ToInt3(Xse.divrem_epi32(RegisterConversion.ToV128(dividend), RegisterConversion.ToV128(divisor), out v128 rem, 3));
remainder = RegisterConversion.ToInt3(rem);

return ret;
}
Expand All @@ -1230,8 +1230,8 @@ public static int4 divrem(int4 dividend, int4 divisor, out int4 remainder)
{
if (Sse2.IsSse2Supported)
{
int4 ret = RegisterConversion.ToType<int4>(Xse.divrem_epi32(RegisterConversion.ToV128(dividend), RegisterConversion.ToV128(divisor), out v128 rem, 4));
remainder = RegisterConversion.ToType<int4>(rem);
int4 ret = RegisterConversion.ToInt4(Xse.divrem_epi32(RegisterConversion.ToV128(dividend), RegisterConversion.ToV128(divisor), out v128 rem, 4));
remainder = RegisterConversion.ToInt4(rem);

return ret;
}
Expand Down Expand Up @@ -1280,8 +1280,8 @@ public static uint2 divrem(uint2 dividend, uint2 divisor, out uint2 remainder)
{
if (Sse2.IsSse2Supported)
{
uint2 ret = RegisterConversion.ToType<uint2>(Xse.divrem_epu32(RegisterConversion.ToV128(dividend), RegisterConversion.ToV128(divisor), out v128 rem, 2));
remainder = RegisterConversion.ToType<uint2>(rem);
uint2 ret = RegisterConversion.ToUInt2(Xse.divrem_epu32(RegisterConversion.ToV128(dividend), RegisterConversion.ToV128(divisor), out v128 rem, 2));
remainder = RegisterConversion.ToUInt2(rem);

return ret;
}
Expand All @@ -1299,8 +1299,8 @@ public static uint3 divrem(uint3 dividend, uint3 divisor, out uint3 remainder)
{
if (Sse2.IsSse2Supported)
{
uint3 ret = RegisterConversion.ToType<uint3>(Xse.divrem_epu32(RegisterConversion.ToV128(dividend), RegisterConversion.ToV128(divisor), out v128 rem, 3));
remainder = RegisterConversion.ToType<uint3>(rem);
uint3 ret = RegisterConversion.ToUInt3(Xse.divrem_epu32(RegisterConversion.ToV128(dividend), RegisterConversion.ToV128(divisor), out v128 rem, 3));
remainder = RegisterConversion.ToUInt3(rem);

return ret;
}
Expand All @@ -1318,8 +1318,8 @@ public static uint4 divrem(uint4 dividend, uint4 divisor, out uint4 remainder)
{
if (Sse2.IsSse2Supported)
{
uint4 ret = RegisterConversion.ToType<uint4>(Xse.divrem_epu32(RegisterConversion.ToV128(dividend), RegisterConversion.ToV128(divisor), out v128 rem, 4));
remainder = RegisterConversion.ToType<uint4>(rem);
uint4 ret = RegisterConversion.ToUInt4(Xse.divrem_epu32(RegisterConversion.ToV128(dividend), RegisterConversion.ToV128(divisor), out v128 rem, 4));
remainder = RegisterConversion.ToUInt4(rem);

return ret;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ public static float2 dadsub(float2 a, float2 b, float2 c, bool fast = false)
{
if (Sse.IsSseSupported)
{
return RegisterConversion.ToType<float2>(Xse.fmsubadd_ps(RegisterConversion.ToV128(a),
return RegisterConversion.ToFloat2(Xse.fmsubadd_ps(RegisterConversion.ToV128(a),
fast ? Sse.rcp_ps(RegisterConversion.ToV128(b)) : RegisterConversion.ToV128(math.rcp(b)),
RegisterConversion.ToV128(c)));
}
Expand All @@ -31,7 +31,7 @@ public static float3 dadsub(float3 a, float3 b, float3 c, bool fast = false)
{
if (Sse.IsSseSupported)
{
return RegisterConversion.ToType<float3>(Xse.fmsubadd_ps(RegisterConversion.ToV128(a),
return RegisterConversion.ToFloat3(Xse.fmsubadd_ps(RegisterConversion.ToV128(a),
fast ? Sse.rcp_ps(RegisterConversion.ToV128(b)) : RegisterConversion.ToV128(math.rcp(b)),
RegisterConversion.ToV128(c)));
}
Expand All @@ -47,7 +47,7 @@ public static float4 dadsub(float4 a, float4 b, float4 c, bool fast = false)
{
if (Sse.IsSseSupported)
{
return RegisterConversion.ToType<float4>(Xse.fmsubadd_ps(RegisterConversion.ToV128(a),
return RegisterConversion.ToFloat4(Xse.fmsubadd_ps(RegisterConversion.ToV128(a),
fast ? Sse.rcp_ps(RegisterConversion.ToV128(b)) : RegisterConversion.ToV128(math.rcp(b)),
RegisterConversion.ToV128(c)));
}
Expand Down Expand Up @@ -78,7 +78,7 @@ public static double2 dadsub(double2 a, double2 b, double2 c, bool fast = false)
{
if (Sse2.IsSse2Supported)
{
return RegisterConversion.ToType<double2>(Xse.fmsubadd_pd(RegisterConversion.ToV128(a),
return RegisterConversion.ToDouble2(Xse.fmsubadd_pd(RegisterConversion.ToV128(a),
fast ? Xse.rcp_pd(RegisterConversion.ToV128(b)) : RegisterConversion.ToV128(math.rcp(b)),
RegisterConversion.ToV128(c)));
}
Expand All @@ -105,7 +105,7 @@ public static double3 dadsub(double3 a, double3 b, double3 c, bool fast = false)
divisor = RegisterConversion.ToV256(math.rcp(b));
}

return RegisterConversion.ToType<double3>(Xse.mm256_fmsubadd_ps(RegisterConversion.ToV256(a), divisor, RegisterConversion.ToV256(c)));
return RegisterConversion.ToDouble3(Xse.mm256_fmsubadd_ps(RegisterConversion.ToV256(a), divisor, RegisterConversion.ToV256(c)));
}
else
{
Expand All @@ -130,7 +130,7 @@ public static double4 dadsub(double4 a, double4 b, double4 c, bool fast = false)
divisor = RegisterConversion.ToV256(math.rcp(b));
}

return RegisterConversion.ToType<double4>(Xse.mm256_fmsubadd_ps(RegisterConversion.ToV256(a), divisor, RegisterConversion.ToV256(c)));
return RegisterConversion.ToDouble4(Xse.mm256_fmsubadd_ps(RegisterConversion.ToV256(a), divisor, RegisterConversion.ToV256(c)));
}
else
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ public static float2 dsubadd(float2 a, float2 b, float2 c, bool fast = false)
{
if (Sse.IsSseSupported)
{
return RegisterConversion.ToType<float2>(Xse.fmaddsub_ps(RegisterConversion.ToV128(a),
return RegisterConversion.ToFloat2(Xse.fmaddsub_ps(RegisterConversion.ToV128(a),
fast ? Sse.rcp_ps(RegisterConversion.ToV128(b)) : RegisterConversion.ToV128(math.rcp(b)),
RegisterConversion.ToV128(c)));
}
Expand All @@ -31,7 +31,7 @@ public static float3 dsubadd(float3 a, float3 b, float3 c, bool fast = false)
{
if (Sse.IsSseSupported)
{
return RegisterConversion.ToType<float3>(Xse.fmaddsub_ps(RegisterConversion.ToV128(a),
return RegisterConversion.ToFloat3(Xse.fmaddsub_ps(RegisterConversion.ToV128(a),
fast ? Sse.rcp_ps(RegisterConversion.ToV128(b)) : RegisterConversion.ToV128(math.rcp(b)),
RegisterConversion.ToV128(c)));
}
Expand All @@ -47,7 +47,7 @@ public static float4 dsubadd(float4 a, float4 b, float4 c, bool fast = false)
{
if (Sse.IsSseSupported)
{
return RegisterConversion.ToType<float4>(Xse.fmaddsub_ps(RegisterConversion.ToV128(a),
return RegisterConversion.ToFloat4(Xse.fmaddsub_ps(RegisterConversion.ToV128(a),
fast ? Sse.rcp_ps(RegisterConversion.ToV128(b)) : RegisterConversion.ToV128(math.rcp(b)),
RegisterConversion.ToV128(c)));
}
Expand Down Expand Up @@ -78,7 +78,7 @@ public static double2 dsubadd(double2 a, double2 b, double2 c, bool fast = false
{
if (Sse2.IsSse2Supported)
{
return RegisterConversion.ToType<double2>(Xse.fmaddsub_pd(RegisterConversion.ToV128(a),
return RegisterConversion.ToDouble2(Xse.fmaddsub_pd(RegisterConversion.ToV128(a),
fast ? Xse.rcp_pd(RegisterConversion.ToV128(b)) : RegisterConversion.ToV128(math.rcp(b)),
RegisterConversion.ToV128(c)));
}
Expand All @@ -105,7 +105,7 @@ public static double3 dsubadd(double3 a, double3 b, double3 c, bool fast = false
divisor = RegisterConversion.ToV256(math.rcp(b));
}

return RegisterConversion.ToType<double3>(Xse.mm256_fmaddsub_ps(RegisterConversion.ToV256(a), divisor, RegisterConversion.ToV256(c)));
return RegisterConversion.ToDouble3(Xse.mm256_fmaddsub_ps(RegisterConversion.ToV256(a), divisor, RegisterConversion.ToV256(c)));
}
else
{
Expand All @@ -130,7 +130,7 @@ public static double4 dsubadd(double4 a, double4 b, double4 c, bool fast = false
divisor = RegisterConversion.ToV256(math.rcp(b));
}

return RegisterConversion.ToType<double4>(Xse.mm256_fmaddsub_ps(RegisterConversion.ToV256(a), divisor, RegisterConversion.ToV256(c)));
return RegisterConversion.ToDouble4(Xse.mm256_fmaddsub_ps(RegisterConversion.ToV256(a), divisor, RegisterConversion.ToV256(c)));
}
else
{
Expand Down
Loading

0 comments on commit d3aa390

Please sign in to comment.