VkFFT v1.3.2 #141

DTolm · 2023-10-23T11:22:46Z

-Added double-double support in VkFFT. Requires cpu initialization in full quad precision, so only supports gcc with quadmath dependency for now. Potentially possible to add full FP128 support or some other FP128 library (like mpir) in the future.
-Data has to be stored in double-double before VkFFT kernels calls (no fp128<->double-double conversion on the GPU yet).
-Full 1e-32 precision, but same range as FP64. See Library for Double-Double and Quad-Double Arithmetic by Y Hida for more information on double-double.
-Double-double requires FMA contraction to be disabled (due to ab-cd contraction rounding mismatch). Doesn't work on Vulkan as I haven't found how to do that yet.
-Added DST I-IV support.
-Fixed warnings (#138)
-Added proper check for app to be zero before initializeVkFFT call and zeroing on deletion (#134)
-Added an option to provide a staging buffer in the application and VkGPU handle (#129)
-Added guards for build type (#128)
-Changed default innermost stride for real buffers in out-of-place R2C from size[0]+2 to size[0] (#139)
-Allow specifying glslang version (#135)
-Improved instruction count and accuracy for radix-7.
-Fixed missing deallocation calls for the inverse Bluestein axes. Fixed the buffer layout size in Vulkan in some cases.
-Refactored the code generator and container struct layout for better handling complex numbers (-5k loc).
-Added more precision tests and benchmarks.

… ..). Current default is main, which changes often and can lead to non-reproducible builds.

Allow specifying glslang version (e.g. cmake -DGLSLANG_GIT_TAG=13.0.0…

-Added double-double support in VkFFT. Requires cpu initialization in full quad precision, so only supports gcc for now. Potentially possible to add full FP128 support or some other FP128 library (like mpir) in the future. -Data has to be stored in double-double before VkFFT kernels calls (no fp128<->double-double conversion on the GPU yet). -Full 1e-32 precision, but same range as FP64. See Library for Double-Double and Quad-Double Arithmetic by Y Hida for more information on double-double. -Reuqires FMA contraction to be disabled (due to ab-cd contraction rounding mismatch). Doesn't work on Vulkan as I haven't found how to do that yet. -Fixed warnings (#138) -Added proper check for app to be zero before initializeVkFFT call and zeroing on deletion (#134) -Added an option to provide staging buffer in application and VkGPU handle (#129) -Added guards for build type (#128) -Fixed missing deallocation calls for the inverse Bluestein axes. Fixed the buffer layout size in Vulkan in some cases. -Refactored the code generator and container struct layout for better handling complex numbers (-5k loc). -Added more precision tests and benchmarks. -Will be merged in the main branch after more testing and update to the documentation.

…upload (+30% performance)

… from size[0]+2 to size[0] -ref: #139

…and quad double-double for computations

…ce optimizations

… also present in v1.3.1)

…ry - helps in quad double double precision

…uad double double -Recalculated radix-7 11 and 13 coefficients in higher precision, multithreaded Rader 11 and 13 is better than single threaded in quad double double -Fixed mistake in external FFT library tests not running correctly in multidimensional cases

-Updated documentation

anarkiwi and others added 16 commits September 12, 2023 23:26

Allow specifying glslang version (e.g. cmake -DGLSLANG_GIT_TAG=13.0.0…

505de65

… ..). Current default is main, which changes often and can lead to non-reproducible builds.

Merge pull request #135 from anarkiwi/gtag2

b4ae141

Allow specifying glslang version (e.g. cmake -DGLSLANG_GIT_TAG=13.0.0…

Added the optimization for MUL codelet that I forgot in the original …

4bea811

…upload (+30% performance)

Changed default innermost stride for real buffers in out-of-place R2C…

2b4d63e

… from size[0]+2 to size[0] -ref: #139

Added quadDoubleDoublePrecisionDoubleMemory to use double for memory …

79a48e3

…and quad double-double for computations

Enable rader FFT algorithm in quad double-double precision, performan…

b43c618

…ce optimizations

Small performance improvement for AMD CDNA2 quad double-double

450c806

Bugfix for kernel caching of kernels using Rader's algorithm (this is…

0152796

… also present in v1.3.1)

fix half precision definitions in Vulkan

8fcaa4b

Fixed r2r quad double double, simplified pf_quad to double2

e76c3dc

Added option to store complex data separately(RRR III) in shared memo…

ee003da

…ry - helps in quad double double precision

Quad double-double bugfix

3c60386

Added DST I-IV support

bc4f3e0

Final 1.3.2 pr before merge

f9b0ac9

-Updated documentation

DTolm merged commit 3fa0c21 into master Oct 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VkFFT v1.3.2 #141

VkFFT v1.3.2 #141

DTolm commented Oct 23, 2023

VkFFT v1.3.2 #141

VkFFT v1.3.2 #141

Conversation

DTolm commented Oct 23, 2023