-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partitioned communications hang #12969
Comments
The inline patch below fixes the issue, I will review and issue a proper PR sometimes early this week diff --git a/ompi/mca/part/persist/part_persist.h b/ompi/mca/part/persist/part_persist.h
index eea447c..07043b0 100644
--- a/ompi/mca/part/persist/part_persist.h
+++ b/ompi/mca/part/persist/part_persist.h
@@ -485,9 +485,8 @@ mca_part_persist_start(size_t count, ompi_request_t** requests)
{
int err = OMPI_SUCCESS;
size_t _count = count;
- size_t i;
- for(i = 0; i < _count && OMPI_SUCCESS == err; i++) {
+ for(size_t i = 0; i < _count && OMPI_SUCCESS == err; i++) {
mca_part_persist_request_t *req = (mca_part_persist_request_t *)(requests[i]);
/* First use is a special case, to support lazy initialization */
if(false == req->first_send)
@@ -503,7 +502,7 @@ mca_part_persist_start(size_t count, ompi_request_t** requests)
} else {
if(MCA_PART_PERSIST_REQUEST_PSEND == req->req_type) {
req->done_count = 0;
- for(i = 0; i < req->real_parts && OMPI_SUCCESS == err; i++) {
+ for(size_t i = 0; i < req->real_parts && OMPI_SUCCESS == err; i++) {
req->flags[i] = -1;
}
} else { @mdosanjh I noted there is a lot of code in a header file, and this is not friendly for debuggers. |
Hey @ggouaillardet - I can't see the entire code, but does that create a shadow variable? Wouldn't it be wise to rename that second counter to something other than "i"? |
Sure, we can also do that! |
use a separate loop index for the innermost loop. Fixes open-mpi#12969 Signed-off-by: Gilles Gouaillardet <[email protected]>
use a separate loop index for the innermost loop. Fixes open-mpi#12969 Signed-off-by: Gilles Gouaillardet <[email protected]>
Thank you for creating the issue and proposing a fix so quickly! |
@Jonashar can you please tell me your full name so I can update the commit message and properly credit you? |
Oh sorry, of course, my name is Jonas Harlacher. |
use a separate loop index for the innermost loop. Thanks Jonas Harlacher for bringing this to our attention. Fixes open-mpi#12969 Signed-off-by: Gilles Gouaillardet <[email protected]>
This issue was initially reported on Stack Overflow at https://stackoverflow.com/questions/79258925/openmpi5-partitioned-communication-with-multiple-neighbors
With the
main
branch, the following program running on 3 MPI tasks hang:The text was updated successfully, but these errors were encountered: