You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
The STF model provides methods to generate CUDA kernels (parallel_for and launch) in addition to orchestrating asynchronous computation.
This is not needed for applications which provide their own kernels or rely on libraries, so removing these feature might speed up compilation significantly. More importantly, these code generation feature require specific flags for CUDA to enable extended lambda functions, and device constexpr functions (--expt-relaxed-constexpr --extended-lambda ) which is prohibited in some cases.
Describe the solution you'd like
We should therefore be able to disable parallel_for and launch when including STF. This is already disabled automatically for non CUDA compilers, but we may still want to disable it for nvcc.
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
I suppose the best option is to define a cudax_ENABLE_CUDASTF_CODE_GENERATION flag. In our presets, that would be set to true by default on nvcc in, and false otherwise.
Applications may set that flag directly in their cmake config, or define -DNO_CUDASTF_CODE_GENERATION manually when using make
Is this a duplicate?
Area
CUDA Experimental (cudax)
Is your feature request related to a problem? Please describe.
The STF model provides methods to generate CUDA kernels (parallel_for and launch) in addition to orchestrating asynchronous computation.
This is not needed for applications which provide their own kernels or rely on libraries, so removing these feature might speed up compilation significantly. More importantly, these code generation feature require specific flags for CUDA to enable extended lambda functions, and device constexpr functions (
--expt-relaxed-constexpr --extended-lambda
) which is prohibited in some cases.Describe the solution you'd like
We should therefore be able to disable parallel_for and launch when including STF. This is already disabled automatically for non CUDA compilers, but we may still want to disable it for nvcc.
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: