You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been checking the internals of cublasdx and I am really amazed with the layout database abstraction. That is, I work with CUTLASS CollectiveMMA, and it takes an expert to get the correct template parameters (TiledMMA, Swizzle Layouts and Copy Operations) that give peak performance.
The cublasDx layout database has seemingly (some qualification needed) solved this problem as the user only specifies a high level GEMM description, which the framework maps to optimal template configurations.
However, as of today, only the smem layouts are exposed via suggest_smem_layout(), understandably so, as this layout is necessary for constructing the tensor arguments passed to the execute(...) method.
My ask, is for a different use case, leveraging cublasDx GEMM descriptions and optimal configurations for CUTLASS methods directly. That is, it would be very helpful if the internally selected TiledMMA and Copy Operations were exposed as well.
Thanks!
The text was updated successfully, but these errors were encountered:
Hello!
I have been checking the internals of cublasdx and I am really amazed with the layout database abstraction. That is, I work with CUTLASS CollectiveMMA, and it takes an expert to get the correct template parameters (TiledMMA, Swizzle Layouts and Copy Operations) that give peak performance.
The cublasDx layout database has seemingly (some qualification needed) solved this problem as the user only specifies a high level GEMM description, which the framework maps to optimal template configurations.
However, as of today, only the smem layouts are exposed via
suggest_smem_layout()
, understandably so, as this layout is necessary for constructing the tensor arguments passed to theexecute(...)
method.My ask, is for a different use case, leveraging cublasDx GEMM descriptions and optimal configurations for CUTLASS methods directly. That is, it would be very helpful if the internally selected TiledMMA and Copy Operations were exposed as well.
Thanks!
The text was updated successfully, but these errors were encountered: