Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find a way to set OpenCL events #39

Open
yves-surrel opened this issue Jun 6, 2021 · 3 comments · May be fixed by #70
Open

Find a way to set OpenCL events #39

yves-surrel opened this issue Jun 6, 2021 · 3 comments · May be fixed by #70

Comments

@yves-surrel
Copy link

In the recent OpenCL backend, the kernel is enqueued with:

result = clEnqueueNDRangeKernel(app->configuration.commandQueue[0], axis->kernel, 3, 0, global_work_size, local_work_size, 0, 0, 0);

i.e. the last three arguments are set to 0. It would be nice to set/have access to these event arguments, which are helpful for synchronization and profiling.

@zachjweiner
Copy link

@DTolm, I was thinking this might be handled best by adding num_events_in_wait_list, event_wait_list, and event members to VkFFTLaunchParams, following the API. The first two denote the events that the first kernel should wait on (i.e., should only be passed at the first kernel invocation). The last one should be passed to the final kernel invocation so that the user can query its status after that last kernel asynchronously returns.

Along these lines, I think it would be best practice to propagate events between each kernel invocation (within a single transform) to ensure synchronization. I'm not familiar with the internals of VkFFT, but one could imagine an out-of-order queue launching a kernel too early.

If this sounds appropriate to you and it would be helpful, I would be willing to work on a PR. Should be pretty simple.

@DTolm
Copy link
Owner

DTolm commented Dec 2, 2021

@zachjweiner
I don't have much experience with OpenCL events, but they should be similar to what Vulkan has with synchronization when commandBuffer is created, but for kernel dispatches. You can do this, or wait until I finish another code reorganization and split it into multiple headers, it should be easier to understand then.

@zachjweiner
Copy link

Taking a closer look, I see now that dispatchEnhanced isn't passed launchParams. I suppose the "right" thing to do (based on a cursory look at the general internal workflow of VkFFTAppend) would be to set

app->configuration.num_events_in_wait_list = launchParams->num_events_in_wait_list;
app->configuration.event_wait_list = launchParams->event_wait_list;
app->configuration.event = launchParams->event;

(and as usual ensure these are zero unless the user sets them).
On the first kernel invocation do the equivalent of

result = clEnqueueNDRangeKernel(app->configuration.commandQueue[0], axis->kernel, 3, 0, global_work_size, local_work_size, app->configuration.num_events_in_wait_list, app->configuration.event_wait_list, 0);

and, on the last,

result = clEnqueueNDRangeKernel(app->configuration.commandQueue[0], axis->kernel, 3, 0, global_work_size, local_work_size, 0, 0, app->configuration.event);

Since VkFFTSync is a no-op on the OpenCL backend (which is good), VkFFTAppend should release asynchronously with the execution of the final kernel, letting the user wait on their launchParams->event. (Right now, one technically has to call clFinish to ensure that kernel finishes before they use the buffers.) Vice-versa, having the first kernel wait on user-supplied events means calling clFinish before VkFFTAppend also is no longer required.

Separately, it would be good to have an internal cl_event between kernel launches. (I'm actually not sure what, if anything, is enforcing synchronization between them at the moment, since the clSetKernelArg calls shouldn't be blocking.) This should allow the host-side work, plus the enqueuing of the subsequent kernel, to occur asynchronously with the previously-enqueued kernel. I'd think removing that overhead would yield a modest (but not negligible, especially for smaller multi-D transforms) performance boost.

@isuruf isuruf linked a pull request May 1, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants