Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve setting launching bounds on preset kernels #64

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ggeorgakoudis
Copy link
Collaborator

  • Replace metadata if already set (CUDA)
  • Add test kernel_preset_bounds

Closes #27

- Replace metadata if already set (CUDA)
- Add test kernel_preset_bounds
Comment on lines +129 to +140
// Replace if the metadata exists.
for (auto *MetadataNode : NvvmAnnotations->operands()) {
// Expecting 3 operands ptr, desc, i32 value.
assert(MetadataNode->getNumOperands() == 3);

auto *PtrMetadata = MetadataNode->getOperand(0).get();
auto *DescMetadata = MetadataNode->getOperand(1).get();
if (PtrMetadata == FuncMetadata && MaxntidxMetadata == DescMetadata) {
MetadataNode->replaceOperandWith(2, MaxThreadsMetadata);
return;
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this what we want? Are we going to override the developers intend? This sounds as bad practice. @davidbeckingsale what do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we are overriding with a value that is correct to achieve greater optimization

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't know that though. What if the user has auto-tuned the code and knows that these values optimize the code? There is no guarantee on what you do there.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible to turn off this optimization with an env var. Down the road we can add annotation extensions to turn it on/off per kernel

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this is not intuitive. The developer says:

__global__ __attribute__((annotate("jit")))
__launch_bounds__(128, 4)

this directly translates to explicit control to turn it on/off per kernel with no additional support from us.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User can be mistaken, Proteus isn't 😝

#include "gpu_common.h"

__global__ __attribute__((annotate("jit")))
__launch_bounds__(128, 4) void kernel() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't recall the semantics. Is it always mandatory to pass the second value? If it is NOT you should add a test for such a case. Cause I am guessing this:

assert(MetadataNode->getNumOperands() == 3);

will fail

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further, I would consider having the bounds as <1, 1>. To match the call site and indirectly test case in which proteus does not call the function correctly (for whatever reason).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's not mandatory. The assert will not fail because this is the format of the metadata.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using different launch bounds than the actual launch dimensions because I can visually check the IR if the optimization happened. Checking will be automated in the future.

Copy link
Collaborator

@koparasy koparasy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. Nits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test setting launch bounds on a kernel with preset ones
2 participants