Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump required versions of OpenPMIX and PRTTE #11875

Merged
merged 1 commit into from
Oct 16, 2023

Conversation

qkoziol
Copy link
Contributor

@qkoziol qkoziol commented Aug 24, 2023

EDIT by @jsquyres: This PR has changed considerably since it was originally filed. Below is the new commit message, which describes what this PR now is.


docs: Better explain required versions of support libraries

Get rid of the table that listed the required support library versions -- the table format was too limiting. Instead, add a bunch more verbiage about the versions that are included in the Open MPI distribution tarball and the minimum required versions of each of hwloc, libevent, PMIx, and PRRTE. Pay special attention to the corner cases of building PMIx (internal and external), with and without PRRTE.

Also set the minimum required versions for PMIx and PRRTE in VERSIONS:

  • Testing shows that Open MPI v5.0.x requires at least PMIx v4.2.0
  • The minimum required version for PRRTE is v3.0.0, but we recommend in the docs that users use >=v3.0.1 so that they get a full mpirun(1) man page

Finally, also show in the docs the versions of the embedded packages (hwloc, libevent, PMIx, and PRRTE). This required adding a little Python in docs/conf.py to read VERSION files and extract version numbers from tarball filenames.


Fixes #10657

@qkoziol
Copy link
Contributor Author

qkoziol commented Aug 24, 2023

@rhc54 please review

@qkoziol qkoziol requested a review from janjust August 24, 2023 17:10
Copy link
Contributor

@lrbison lrbison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on discussion in issue #10657, this change looks good.

@awlauria
Copy link
Contributor

PRRTe v3.0.1 is unreleased - so we'll want to hold off merging until it is, right?

@qkoziol
Copy link
Contributor Author

qkoziol commented Aug 24, 2023

PRRTe v3.0.1 is unreleased - so we'll want to hold off merging until it is, right?

I've asked Ralph about his opinions around this. I'd like to list 3.0.1 as the minimum, ask Ralph to release 3.0.1, update the submodule pointer for PRRTE, then make the next rc.

@rhc54
Copy link
Contributor

rhc54 commented Aug 25, 2023

PRRTE v3.0.1 is on "hold" pending completion of the docs refactoring being done by @jsquyres

@qkoziol
Copy link
Contributor Author

qkoziol commented Aug 25, 2023

PRRTE v3.0.1 is on "hold" pending completion of the docs refactoring being done by @jsquyres

OK, that doc work is also a blocker for the OMPI 5.0 release. So, we should be able to sequence things:

  • Finish doc refactoring and merge to both PRRTE and OMPI repos
  • PRRTE 3.0.1 release
  • OMPI bumps the submodule pointers to include the PRRTE 3.0.1 release
  • Merge this PR to OMPI
  • Make the next OMPI rc tarball

@awlauria - Yes?

@awlauria
Copy link
Contributor

@qkoziol yes. Sounds right.

@jsquyres
Copy link
Member

This submodule update for main is now about a month old -- is it still relevant? I think someone should double check the submodule pointers here before merging this PR -- it may be stale/out of date at this point.

@awlauria
Copy link
Contributor

This isn't a submodule update - it's just updating the version file.

@lrbison
Copy link
Contributor

lrbison commented Sep 28, 2023

@jsquyres do you mean to make sure that main is still able to build based on the (relatively) old versions of prrte and pmix on ompi main?

Copy link
Member

@jsquyres jsquyres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(...crawling back into my hole...)

@jsquyres
Copy link
Member

@jsquyres do you mean to make sure that main is still able to build based on the (relatively) old versions of prrte and pmix on ompi main?

No. I just got myself confused on by all the submodule PRs that were opened / closed / commented on in the past 24 hours. Somehow I thought this was one of those, but it's not -- it's a good VERSION file update.

...but as I say that, now I'm wondering if I remember how VERSION is used, because I realize I don't remember. Are the PMIx and PRTE versions in VERSION used to compare in configure and ignore PMIx / PRTE installations that are found below the versions in VERSIONS? Or are they used some other way? More bluntly: how are the PMIx / PRTE versions in VERSIONS used?

@awlauria
Copy link
Contributor

awlauria commented Sep 28, 2023

At the very least - these VERSION are propagated to the docs here automatically:

https://docs.open-mpi.org/en/v5.0.x/installing-open-mpi/required-support-libraries.html

I think it is also enforced in ompi configure:

https://github.com/open-mpi/ompi/blob/v5.0.x/config/opal_config_pmix.m4#L195

though I admit I have not tried it..in a long time..if ever. I can take the action to see that is the case for pmix/prrte after this gets in.

@rhc54
Copy link
Contributor

rhc54 commented Sep 28, 2023

At least in PMIx/PRRTE themselves, I don't believe they are used in the configure logic. In fact, I just discovered that the min version of PMIx shown in the PRRTE VERSION file was wrong and configure was looking at a different value! Fixed it prior to release just to keep it clean.

IIRC, someone put them there (wasn't me) to help collect the info in one place for ease of reference. Not sure if something actually uses those values or not - if not, then we probably should remove them as they get stale and will confuse people.

@awlauria
Copy link
Contributor

They definitely should not be removed as it is used for doc generation.

It looks like it is enforced in the ompi configure logic, but I can check.

Copy link
Member

@bwbarrett bwbarrett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What changed that OMPI 5.0.0 no longer works with PMIx 4.1.2 or PRRTE 2.0.2? Not has all features, but works with?

The VERSION file versions are not the versions that are shipped with OMPI, but the minimum versions that this version of OMPI supports (internal or external).

Copy link
Member

@jsquyres jsquyres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, now that I'm remembering what these versions are used for, I agree with @bwbarrett (and shame on me for not checking before I blindly approved the PR; bad Jeff!!).

If we want to document minimum and recommended versions, that's a different thing. But right now, these are minimum supported versions.

@rhc54
Copy link
Contributor

rhc54 commented Sep 28, 2023

right now, these are minimum supported versions.

Well, yes and no. The problem you still need to address is min PMIx required when building with PRRTE vs without PRRTE. The two are quite different - or at least, they can be quite different if you protect code regions that require more advanced PMIx definitions.

I don't believe your configure deals with that just yet.

@rhc54
Copy link
Contributor

rhc54 commented Sep 28, 2023

The VERSION file versions are not the versions that are shipped with OMPI, but the minimum versions that this version of OMPI supports (internal or external).

Gets a little tricky, doesn't it? If I specify an external PMIx but don't build the internal PRRTE against it, then that would be one minimum version. If I specify an external PMIx and do build the internal PRRTE, then that is a different minimum version.

Bottom line: I don't believe the current VERSION values truly convey the correct information. It isn't a single value, but more like a matrix.

@qkoziol
Copy link
Contributor Author

qkoziol commented Sep 28, 2023

right now, these are minimum supported versions.

Well, yes and no. The problem you still need to address is min PMIx required when building with PRRTE vs without PRRTE. The two are quite different - or at least, they can be quite different if you protect code regions that require more advanced PMIx definitions.
I don't believe your configure deals with that just yet.

I don't think it has to handle this case. The minimum PMIx version should be the without PRRTE value. That's what OMPI needs as a PMIx version. The user won't be able to build PRRTE if PMIx is too old for PRRTE, so we won't get into a bad corner case ; there's no way the customer will have a working PRRTE to point us at if the PMIx is too old for PRRTE.

I will check what version of PMIx is required for the non-PRRTE build.

For a non-PRRTE build, OMPI can accept versions of PMIx as old as 2.0.2
So, that implies that we could leave the PMIx version # alone here and just bump the PRRTE version, along with a note saying that PMIx v4.2.x is required when building with PRRTE.
Sound OK? @bwbarrett @jsquyres @awlauria @rhc54

I think you mean PMIx v4.1 correct?

PMIx 2.0.2 is ... old.

Yeah, I was surprised that OMPI would work with a version that old also. As I said, I think we should leave the minimum version for PMIx in the file alone (at 4.1.x) and update only the PRRTE version and the documentation in the direction that Ralph mentioned (about external PRRTE, etc).

@rhc54
Copy link
Contributor

rhc54 commented Sep 28, 2023

I was saying that we could leave the PMIx version alone (at 4.1.x) and change this PR to only modify the PRRTE version. (And update the doc)

Sorry - I was only using 2.0 as an example as that is what you had cited as min that worked. My point remains the same: I cannot support PMIx versions older than v4.2.x. I grok that you might want to quote a lower minimum, and that's fine - I get it. Unfortunately, "works with" implies "is supported". I only want to ensure we are all on the same page that OMPI will be taking responsibility for responding to issues involving OMPI v5 working with PMIx versions older than v4.2.x.

No different than what you now do for OMPI v4.x with respect to ORTE and PMIx v3.x. Just don't want any future misunderstandings when I refuse issues filed by OMPI users that ran with older PMIx versions. I will at most suggest that they upgrade and try again, which I suspect is probably where you might also start.

@qkoziol
Copy link
Contributor Author

qkoziol commented Sep 28, 2023

I was saying that we could leave the PMIx version alone (at 4.1.x) and change this PR to only modify the PRRTE version. (And update the doc)

Sorry - I was only using 2.0 as an example as that is what you had cited as min that worked. My point remains the same: I cannot support PMIx versions older than v4.2.x. I grok that you might want to quote a lower minimum, and that's fine - I get it. Unfortunately, "works with" implies "is supported". I only want to ensure we are all on the same page that OMPI will be taking responsibility for responding to issues involving OMPI v5 working with PMIx versions older than v4.2.x.

No different than what you now do for OMPI v4.x with respect to ORTE and PMIx v3.x. Just don't want any future misunderstandings when I refuse issues filed by OMPI users that ran with older PMIx versions. I will at most suggest that they upgrade and try again, which I suspect is probably where you might also start.

Yep, I'm totally fine with what you say. Just need consensus from the core & RM devs who have commented on this ticket and requested changes. (@bwbarrett & @jsquyres , at this point)

@bwbarrett
Copy link
Member

I mostly disagree with Ralph here; unless we have reason to, we shouldn't move the required versions. Long term, it's going to be bad mojo to push customers to update what we hope is a system library. If we know something is wrong with an old version and new OMPI, then sure, we should update. But similar to Libevent or hwloc, there is a huge cost to our customers to unnecessarily move the required version.

@awlauria
Copy link
Contributor

Recall that the prrte 2.0 command line is completely different, and probably broken from what is in 3.0. It's likely anyone using a 2.0 prrte will have various command line issues, and will probably raise these issues in ompi. To which our reply will always be..upgrade to v3.0.1. This just screams unnecessary headache to me.

@bwbarrett
Copy link
Member

I'm not sure I agree on the prrte 2.0, as you're using someone's existing setup, but I have fewer complaints there than with PMIX. However, I can't see why we'd say 3.0.0 doesn't work.

@awlauria
Copy link
Contributor

Sure, we can move this to prrte v3.0 instead of v3.0.1.

However, it says in the prrte v3.0 release notes:

[PRRTE v3.0.0](https://github.com/openpmix/prrte/releases/tag/v3.0.0)
This is the latest production release of the PRRTE software system. It
contains full support of the PMIx v4.2 Standard plus extensions as provided
by OpenPMIx v4.2. **It requires support from the OpenPMIx v4.2.2 release or above.**

@jsquyres
Copy link
Member

We might benefit from a 30-min webex to sort this all out (Vs. a zillion more comments here on a very detailed, confusing topic).

Also, I'd like to remind everyone that the PMIX/PRTE version numbers in VERSION were solely put there so that we could easily substitute what are currently called the "minimum versions" in https://docs.open-mpi.org/en/v5.0.x/installing-open-mpi/required-support-libraries.html. We can easily change the contents of VERSION to contain multiple values (e.g., minimum version and recommended version) + update the table, and/or update other text in the docs.

My point: let's decide what we want to say -- about minimum versions, about supported versions, about recommended versions, about ... whatever we want -- and then we can adjust VERSION and the docs to match.

@bwbarrett
Copy link
Member

Call good. Especially since I disagree with something Jeff said about VERSION in his call for a call :).

@awlauria
Copy link
Contributor

I'm fine with bumping this to prrte v3.0.0 (instead of v3.0.1) and documenting recommended versions.

@rhc54
Copy link
Contributor

rhc54 commented Sep 29, 2023

Looks like you may have a bug in your OMPI v5 PMIx configure code if you intend to support older PMIx versions:

$ ./configure --prefix=/Users/rhc/openmpi/build/v5.0 --with-libevent=/Users/rhc/libevent/build --with-hwloc=/Users/rhc/hwloc/build/v2.7.0 --with-pmix=/Users/rhc/pmix/build/v3.0 --without-prte

...

*** Configuring PMIx
checking for pmix pkg-config name... /Users/rhc/pmix/build/v3.0/lib/pkgconfig/pmix.pc
checking if pmix pkg-config module exists... no
checking for pmix wrapper compiler... /Users/rhc/pmix/build/v3.0/bin/pmixcc
checking if pmix wrapper compiler works... no
checking for pmix header at /Users/rhc/pmix/build/v3.0/include... found
checking for pmix library (pmix) in /Users/rhc/pmix/build/v3.0... found -- lib
checking for pmix cppflags... -I/Users/rhc/pmix/build/v3.0/include
checking for pmix ldflags... -L/Users/rhc/pmix/build/v3.0/lib
checking for pmix libs... -lpmix
checking for pmix static libs... -lpmix
checking for pmix.h... yes
checking for PMIx_Init... no
configure: error: External PMIx requested but not found.

Note that the pkg-config directory was introduced in PMIx v3.2 and pmixcc doesn't appear until v4.0. So I fail to understand this statement from @qkoziol:

For a non-PRRTE build, OMPI can accept versions of PMIx as old as 2.0.2

OMPI's v5 configure will fail for anything that old, AFAICT. Perhaps a misspeak?

@rhc54
Copy link
Contributor

rhc54 commented Sep 29, 2023

I'm fine with bumping this to prrte v3.0.0 (instead of v3.0.1) and documenting recommended versions.

Just a reminder, in case you've forgotten: PRRTE v3.0.0 does not have your mpirun man page, nor any of the revised doc system files by which you could reconstruct it. So users will be limited to the --help system. It also has a rather different mpirun cmd line then you eventually settled on, and there were quite a few bug patches between v3.0.0 and v3.0.1.

So you may need to carefully define what you mean by "minimum required" for PRRTE. Note that a user could probably launch their job using ORTE from an earlier version of OMPI, if that's the only requirement.

@awlauria
Copy link
Contributor

Looks like you may have a bug in your OMPI v5 PMIx configure code if you intend to support older PMIx versions:

$ ./configure --prefix=/Users/rhc/openmpi/build/v5.0 --with-libevent=/Users/rhc/libevent/build --with-hwloc=/Users/rhc/hwloc/build/v2.7.0 --with-pmix=/Users/rhc/pmix/build/v3.0 --without-prte

...

*** Configuring PMIx
checking for pmix pkg-config name... /Users/rhc/pmix/build/v3.0/lib/pkgconfig/pmix.pc
checking if pmix pkg-config module exists... no
checking for pmix wrapper compiler... /Users/rhc/pmix/build/v3.0/bin/pmixcc
checking if pmix wrapper compiler works... no
checking for pmix header at /Users/rhc/pmix/build/v3.0/include... found
checking for pmix library (pmix) in /Users/rhc/pmix/build/v3.0... found -- lib
checking for pmix cppflags... -I/Users/rhc/pmix/build/v3.0/include
checking for pmix ldflags... -L/Users/rhc/pmix/build/v3.0/lib
checking for pmix libs... -lpmix
checking for pmix static libs... -lpmix
checking for pmix.h... yes
checking for PMIx_Init... no
configure: error: External PMIx requested but not found.

Note that the pkg-config directory was introduced in PMIx v3.2 and pmixcc doesn't appear until v4.0. So I fail to understand this statement from @qkoziol:

For a non-PRRTE build, OMPI can accept versions of PMIx as old as 2.0.2

OMPI's v5 configure will fail for anything that old, AFAICT. Perhaps a misspeak?

this has to be a mis-speak.

I think he meant prrte v2.0.2. Not PMIx.

@rhc54
Copy link
Contributor

rhc54 commented Sep 29, 2023

Agreed. FWIW: PMIx v4.0.x is not supported by OMPI v5 - get the following error (and it is just the start):

base/pmix_base_fns.c:43:10: fatal error: src/include/pmix_frameworks.h: No such file or directory
   43 | #include "src/include/pmix_frameworks.h"
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

PMIx v4.2 changed this filename to avoid conflicts with PRRTE. Note that this also means anything pre-4.1 is automatically disqualified as the utilities were not exposed for reuse, and OMPI is using them. I'll run a test build on PMIx v4.1 just to see if that can work.

@rhc54
Copy link
Contributor

rhc54 commented Sep 29, 2023

Afraid OMPI v5 with PMIx v4.1 has similar problems:

base/pmix_base_fns.c: In function 'check_pmix_param':
base/pmix_base_fns.c:94:23: error: 'pmix_framework_names' undeclared (first use in this function); did you mean 'pmix_frameworks'?
   94 |     for (n=0; NULL != pmix_framework_names[n]; n++) {
      |                       ^~~~~~~~~~~~~~~~~~~~
      |                       pmix_frameworks

If you want these earlier versions to work, then you'll need to add some #if protection and case support in a number of places to work around these differences. You may also have to add some PMIx utility code for pre-v4.2 support to supply missing functions. I'm seeing attributes and even a few function calls that are not available pre-v4.2, so bottom line is that you'll need to spend some time protecting areas of OMPI.

@qkoziol
Copy link
Contributor Author

qkoziol commented Sep 29, 2023

Looks like you may have a bug in your OMPI v5 PMIx configure code if you intend to support older PMIx versions:

$ ./configure --prefix=/Users/rhc/openmpi/build/v5.0 --with-libevent=/Users/rhc/libevent/build --with-hwloc=/Users/rhc/hwloc/build/v2.7.0 --with-pmix=/Users/rhc/pmix/build/v3.0 --without-prte

...

*** Configuring PMIx
checking for pmix pkg-config name... /Users/rhc/pmix/build/v3.0/lib/pkgconfig/pmix.pc
checking if pmix pkg-config module exists... no
checking for pmix wrapper compiler... /Users/rhc/pmix/build/v3.0/bin/pmixcc
checking if pmix wrapper compiler works... no
checking for pmix header at /Users/rhc/pmix/build/v3.0/include... found
checking for pmix library (pmix) in /Users/rhc/pmix/build/v3.0... found -- lib
checking for pmix cppflags... -I/Users/rhc/pmix/build/v3.0/include
checking for pmix ldflags... -L/Users/rhc/pmix/build/v3.0/lib
checking for pmix libs... -lpmix
checking for pmix static libs... -lpmix
checking for pmix.h... yes
checking for PMIx_Init... no
configure: error: External PMIx requested but not found.

Note that the pkg-config directory was introduced in PMIx v3.2 and pmixcc doesn't appear until v4.0. So I fail to understand this statement from @qkoziol:

For a non-PRRTE build, OMPI can accept versions of PMIx as old as 2.0.2

OMPI's v5 configure will fail for anything that old, AFAICT. Perhaps a misspeak?

this has to be a mis-speak.

I think he meant prrte v2.0.2. Not PMIx.

No, I meant PMIx 2.0.2, but after investigating further, it turns out that the OMPI configure script has a bug: when I pointed the OMPI configure at the OpenPMIx 2.0.2 install directory, it failed to compile against it, but silently reverted back to the internal PMIx submodule (in the 3rd-party directory) and built against that.

Ralph's already performed the other experiments, so I won't duplicate them.

Happy to join a call, if that happens.

Get rid of the table that listed the required support library versions
-- the table format was too limiting.  Instead, add a bunch more
verbiage about the versions that are included in the Open MPI
distribution tarball and the minimum required versions of each of
hwloc, libevent, PMIx, and PRRTE.  Pay special attention to the corner
cases of building PMIx (internal and external), with and without
PRRTE.

Also set the minimum required versions for PMIx and PRRTE in VERSIONS:

* Testing shows that Open MPI v5.0.x requires at least PMIx v4.2.0
* The minimum required version for PRRTE is v3.0.0, but we recommend
  in the docs that users use >=v3.0.1 so that they get a full
  mpirun(1) man page

Finally, also show in the docs the versions of the embedded packages
(hwloc, libevent, PMIx, and PRRTE).  This required adding a little
Python in docs/conf.py to read VERSION files and extract version
numbers from tarball filenames.

Signed-off-by: Jeff Squyres <[email protected]>
@jsquyres
Copy link
Member

jsquyres commented Sep 30, 2023

We had a call yesterday (a few hours before @qkoziol's reply here) and came up with a general sentiment of what we wanted to convey in the docs. I have updated this PR to basically have an all-new commit with new verbiage and explanations in the docs, as well as adding a listing of the versions of the embedded packages (hwloc, libevent, PRRTE, OpenPMIx).

Reviewers: Please have a look at how it all looks in the docs now: https://ompi--11875.org.readthedocs.build/en/11875/installing-open-mpi/required-support-libraries.html. Keep in mind that this PR is on main, not v5.0.x. So the PMIx / PRRTE "embedded" versions listed reflect the git submodules on main, not v5.0.x.

Also, we slightly changed the minimum PMIx and PRRTE versions in the VERSIONS file:

  • PMIx minimum 4.2.0
  • PRRTE minimum 3.0.0

However, in checking with @rhc54, it looks like he's done some more testing, and PMIx 4.2.0 is not sufficient. And possibly 4.2.2 is not sufficient, either. Specifically, the MCA conversion code that was ported over to OMPI recently for use in direct launch did not exist in PMIx until pretty recently.

We need to decide what we want to do about that.

@jsquyres jsquyres added this to the v5.0.0 milestone Sep 30, 2023
@qkoziol
Copy link
Contributor Author

qkoziol commented Oct 2, 2023

We had a call yesterday (a few hours before @qkoziol's reply here) and came up with a general sentiment of what we wanted to convey in the docs. I have updated this PR to basically have an all-new commit with new verbiage and explanations in the docs, as well as adding a listing of the versions of the embedded packages (hwloc, libevent, PRRTE, OpenPMIx).

Reviewers: Please have a look at how it all looks in the docs now: https://ompi--11875.org.readthedocs.build/en/11875/installing-open-mpi/required-support-libraries.html. Keep in mind that this PR is on main, not v5.0.x. So the PMIx / PRRTE "embedded" versions listed reflect the git submodules on main, not v5.0.x.

Also, we slightly changed the minimum PMIx and PRRTE versions in the VERSIONS file:

  • PMIx minimum 4.2.0
  • PRRTE minimum 3.0.0

However, in checking with @rhc54, it looks like he's done some more testing, and PMIx 4.2.0 is not sufficient. And possibly 4.2.2 is not sufficient, either. Specifically, the MCA conversion code that was ported over to OMPI recently for use in direct launch did not exist in PMIx until pretty recently.

We need to decide what we want to do about that.

LGTM

@lrbison
Copy link
Contributor

lrbison commented Oct 12, 2023

Some testing By @wenduwan and @janjust have confirmed we need PMIx 4.2.4 at minimum.

@lrbison
Copy link
Contributor

lrbison commented Oct 12, 2023

We talked about this on v5.0.x RM call this morning. Consensus was to leave this PR as is and correct or clarify the complex version dependencies in future PRs. We don't see this as a blocker for release.

@awlauria @janjust ready to merge?

@janjust janjust merged commit fee1d70 into open-mpi:main Oct 16, 2023
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

documentation: need to update how-to-build section of Open MPI for the v5.0.0 release
7 participants