-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve setting launching bounds on preset kernels #64
base: main
Are you sure you want to change the base?
Conversation
- Replace metadata if already set (CUDA) - Add test kernel_preset_bounds
// Replace if the metadata exists. | ||
for (auto *MetadataNode : NvvmAnnotations->operands()) { | ||
// Expecting 3 operands ptr, desc, i32 value. | ||
assert(MetadataNode->getNumOperands() == 3); | ||
|
||
auto *PtrMetadata = MetadataNode->getOperand(0).get(); | ||
auto *DescMetadata = MetadataNode->getOperand(1).get(); | ||
if (PtrMetadata == FuncMetadata && MaxntidxMetadata == DescMetadata) { | ||
MetadataNode->replaceOperandWith(2, MaxThreadsMetadata); | ||
return; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this what we want? Are we going to override the developers intend? This sounds as bad practice. @davidbeckingsale what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we are overriding with a value that is correct to achieve greater optimization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't know that though. What if the user has auto-tuned the code and knows that these values optimize the code? There is no guarantee on what you do there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible to turn off this optimization with an env var. Down the road we can add annotation extensions to turn it on/off per kernel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this is not intuitive. The developer says:
__global__ __attribute__((annotate("jit")))
__launch_bounds__(128, 4)
this directly translates to explicit control to turn it on/off per kernel with no additional support from us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
User can be mistaken, Proteus isn't 😝
#include "gpu_common.h" | ||
|
||
__global__ __attribute__((annotate("jit"))) | ||
__launch_bounds__(128, 4) void kernel() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't recall the semantics. Is it always mandatory to pass the second value? If it is NOT you should add a test for such a case. Cause I am guessing this:
assert(MetadataNode->getNumOperands() == 3);
will fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further, I would consider having the bounds as <1, 1>. To match the call site and indirectly test case in which proteus does not call the function correctly (for whatever reason).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's not mandatory. The assert
will not fail because this is the format of the metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using different launch bounds than the actual launch dimensions because I can visually check the IR if the optimization happened. Checking will be automated in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. Nits
Closes #27