You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm new to CUDA and I ran into a race condition which could perhaps be prevented with changes to the documentation.
The problem
Mixing the default stream and a custom stream isn't a good idea.
The implementations of the CopyDestination trait are implicitly using the default stream. When you launch a kernel on a stream that was created with the NON_BLOCKING flag, this can lead to a race condition.
My confusion
The documentation of the NON_BLOCKING stream flag has a good explanation about the default (NULL) stream. Though the sentence:
Since RustaCUDA does not provide access to the NULL stream, this flag has no effect in
most circumstances. However, it is recommended to use it anyway, as some other crate
in this binary may be using the NULL stream directly.
made me believe that as long as I use RustaCUDA, I should enable NON_BLOCKING and everything will be fine. The default stream is not used within the library, which is not true as mentioned above.
For me there were two solutions:
Either not setting the NON_BLOCKING stream flag, this way even if I launch the kernel on a custom stream (there is currently no way in RustaCUDA to launch it on the default stream), things would properly be synchronized.
I use the async copy methods on the same stream I launch the kernel on and synchronize the stream right after the copy operation (that's what I did).
Proposed fix
I propose adding a warning/info to the NON_BLOCKING stream flag documentation, that states that the synchronuous copy versions use the default stream and this setting might have an impact. Additionally I'd add information about the default stream to the CopyDestination trait itself.
The text was updated successfully, but these errors were encountered:
I'm new to CUDA and I ran into a race condition which could perhaps be prevented with changes to the documentation.
The problem
Mixing the default stream and a custom stream isn't a good idea.
The implementations of the
CopyDestination
trait are implicitly using the default stream. When you launch a kernel on a stream that was created with theNON_BLOCKING
flag, this can lead to a race condition.My confusion
The documentation of the
NON_BLOCKING
stream flag has a good explanation about the default (NULL) stream. Though the sentence:made me believe that as long as I use RustaCUDA, I should enable
NON_BLOCKING
and everything will be fine. The default stream is not used within the library, which is not true as mentioned above.For me there were two solutions:
NON_BLOCKING
stream flag, this way even if I launch the kernel on a custom stream (there is currently no way in RustaCUDA to launch it on the default stream), things would properly be synchronized.Proposed fix
I propose adding a warning/info to the
NON_BLOCKING
stream flag documentation, that states that the synchronuous copy versions use the default stream and this setting might have an impact. Additionally I'd add information about the default stream to theCopyDestination
trait itself.The text was updated successfully, but these errors were encountered: