-
Notifications
You must be signed in to change notification settings - Fork 864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bump required versions of OpenPMIX and PRTTE #11875
Conversation
@rhc54 please review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on discussion in issue #10657, this change looks good.
PRRTe v3.0.1 is unreleased - so we'll want to hold off merging until it is, right? |
I've asked Ralph about his opinions around this. I'd like to list 3.0.1 as the minimum, ask Ralph to release 3.0.1, update the submodule pointer for PRRTE, then make the next rc. |
PRRTE v3.0.1 is on "hold" pending completion of the docs refactoring being done by @jsquyres |
OK, that doc work is also a blocker for the OMPI 5.0 release. So, we should be able to sequence things:
@awlauria - Yes? |
@qkoziol yes. Sounds right. |
This submodule update for |
This isn't a submodule update - it's just updating the version file. |
@jsquyres do you mean to make sure that main is still able to build based on the (relatively) old versions of prrte and pmix on ompi main? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(...crawling back into my hole...)
No. I just got myself confused on by all the submodule PRs that were opened / closed / commented on in the past 24 hours. Somehow I thought this was one of those, but it's not -- it's a good ...but as I say that, now I'm wondering if I remember how |
At the very least - these VERSION are propagated to the docs here automatically: https://docs.open-mpi.org/en/v5.0.x/installing-open-mpi/required-support-libraries.html I think it is also enforced in ompi configure: https://github.com/open-mpi/ompi/blob/v5.0.x/config/opal_config_pmix.m4#L195 though I admit I have not tried it..in a long time..if ever. I can take the action to see that is the case for pmix/prrte after this gets in. |
At least in PMIx/PRRTE themselves, I don't believe they are used in the configure logic. In fact, I just discovered that the min version of PMIx shown in the PRRTE VERSION file was wrong and configure was looking at a different value! Fixed it prior to release just to keep it clean. IIRC, someone put them there (wasn't me) to help collect the info in one place for ease of reference. Not sure if something actually uses those values or not - if not, then we probably should remove them as they get stale and will confuse people. |
They definitely should not be removed as it is used for doc generation. It looks like it is enforced in the ompi configure logic, but I can check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What changed that OMPI 5.0.0 no longer works with PMIx 4.1.2 or PRRTE 2.0.2? Not has all features, but works with?
The VERSION file versions are not the versions that are shipped with OMPI, but the minimum versions that this version of OMPI supports (internal or external).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, now that I'm remembering what these versions are used for, I agree with @bwbarrett (and shame on me for not checking before I blindly approved the PR; bad Jeff!!).
If we want to document minimum and recommended versions, that's a different thing. But right now, these are minimum supported versions.
Well, yes and no. The problem you still need to address is min PMIx required when building with PRRTE vs without PRRTE. The two are quite different - or at least, they can be quite different if you protect code regions that require more advanced PMIx definitions. I don't believe your configure deals with that just yet. |
Gets a little tricky, doesn't it? If I specify an external PMIx but don't build the internal PRRTE against it, then that would be one minimum version. If I specify an external PMIx and do build the internal PRRTE, then that is a different minimum version. Bottom line: I don't believe the current VERSION values truly convey the correct information. It isn't a single value, but more like a matrix. |
Yeah, I was surprised that OMPI would work with a version that old also. As I said, I think we should leave the minimum version for PMIx in the file alone (at 4.1.x) and update only the PRRTE version and the documentation in the direction that Ralph mentioned (about external PRRTE, etc). |
Sorry - I was only using 2.0 as an example as that is what you had cited as min that worked. My point remains the same: I cannot support PMIx versions older than v4.2.x. I grok that you might want to quote a lower minimum, and that's fine - I get it. Unfortunately, "works with" implies "is supported". I only want to ensure we are all on the same page that OMPI will be taking responsibility for responding to issues involving OMPI v5 working with PMIx versions older than v4.2.x. No different than what you now do for OMPI v4.x with respect to ORTE and PMIx v3.x. Just don't want any future misunderstandings when I refuse issues filed by OMPI users that ran with older PMIx versions. I will at most suggest that they upgrade and try again, which I suspect is probably where you might also start. |
Yep, I'm totally fine with what you say. Just need consensus from the core & RM devs who have commented on this ticket and requested changes. (@bwbarrett & @jsquyres , at this point) |
I mostly disagree with Ralph here; unless we have reason to, we shouldn't move the required versions. Long term, it's going to be bad mojo to push customers to update what we hope is a system library. If we know something is wrong with an old version and new OMPI, then sure, we should update. But similar to Libevent or hwloc, there is a huge cost to our customers to unnecessarily move the required version. |
Recall that the prrte 2.0 command line is completely different, and probably broken from what is in 3.0. It's likely anyone using a 2.0 prrte will have various command line issues, and will probably raise these issues in ompi. To which our reply will always be..upgrade to v3.0.1. This just screams unnecessary headache to me. |
I'm not sure I agree on the prrte 2.0, as you're using someone's existing setup, but I have fewer complaints there than with PMIX. However, I can't see why we'd say 3.0.0 doesn't work. |
Sure, we can move this to prrte v3.0 instead of v3.0.1. However, it says in the prrte v3.0 release notes:
|
We might benefit from a 30-min webex to sort this all out (Vs. a zillion more comments here on a very detailed, confusing topic). Also, I'd like to remind everyone that the PMIX/PRTE version numbers in My point: let's decide what we want to say -- about minimum versions, about supported versions, about recommended versions, about ... whatever we want -- and then we can adjust |
Call good. Especially since I disagree with something Jeff said about VERSION in his call for a call :). |
I'm fine with bumping this to prrte v3.0.0 (instead of v3.0.1) and documenting recommended versions. |
Looks like you may have a bug in your OMPI v5 PMIx configure code if you intend to support older PMIx versions: $ ./configure --prefix=/Users/rhc/openmpi/build/v5.0 --with-libevent=/Users/rhc/libevent/build --with-hwloc=/Users/rhc/hwloc/build/v2.7.0 --with-pmix=/Users/rhc/pmix/build/v3.0 --without-prte
...
*** Configuring PMIx
checking for pmix pkg-config name... /Users/rhc/pmix/build/v3.0/lib/pkgconfig/pmix.pc
checking if pmix pkg-config module exists... no
checking for pmix wrapper compiler... /Users/rhc/pmix/build/v3.0/bin/pmixcc
checking if pmix wrapper compiler works... no
checking for pmix header at /Users/rhc/pmix/build/v3.0/include... found
checking for pmix library (pmix) in /Users/rhc/pmix/build/v3.0... found -- lib
checking for pmix cppflags... -I/Users/rhc/pmix/build/v3.0/include
checking for pmix ldflags... -L/Users/rhc/pmix/build/v3.0/lib
checking for pmix libs... -lpmix
checking for pmix static libs... -lpmix
checking for pmix.h... yes
checking for PMIx_Init... no
configure: error: External PMIx requested but not found. Note that the pkg-config directory was introduced in PMIx v3.2 and
OMPI's v5 configure will fail for anything that old, AFAICT. Perhaps a misspeak? |
Just a reminder, in case you've forgotten: PRRTE v3.0.0 does not have your So you may need to carefully define what you mean by "minimum required" for PRRTE. Note that a user could probably launch their job using ORTE from an earlier version of OMPI, if that's the only requirement. |
this has to be a mis-speak. I think he meant prrte v2.0.2. Not PMIx. |
Agreed. FWIW: PMIx v4.0.x is not supported by OMPI v5 - get the following error (and it is just the start): base/pmix_base_fns.c:43:10: fatal error: src/include/pmix_frameworks.h: No such file or directory
43 | #include "src/include/pmix_frameworks.h"
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ PMIx v4.2 changed this filename to avoid conflicts with PRRTE. Note that this also means anything pre-4.1 is automatically disqualified as the utilities were not exposed for reuse, and OMPI is using them. I'll run a test build on PMIx v4.1 just to see if that can work. |
Afraid OMPI v5 with PMIx v4.1 has similar problems: base/pmix_base_fns.c: In function 'check_pmix_param':
base/pmix_base_fns.c:94:23: error: 'pmix_framework_names' undeclared (first use in this function); did you mean 'pmix_frameworks'?
94 | for (n=0; NULL != pmix_framework_names[n]; n++) {
| ^~~~~~~~~~~~~~~~~~~~
| pmix_frameworks If you want these earlier versions to work, then you'll need to add some |
No, I meant PMIx 2.0.2, but after investigating further, it turns out that the OMPI configure script has a bug: when I pointed the OMPI configure at the OpenPMIx 2.0.2 install directory, it failed to compile against it, but silently reverted back to the internal PMIx submodule (in the 3rd-party directory) and built against that. Ralph's already performed the other experiments, so I won't duplicate them. Happy to join a call, if that happens. |
5d76d7b
to
9455f22
Compare
Get rid of the table that listed the required support library versions -- the table format was too limiting. Instead, add a bunch more verbiage about the versions that are included in the Open MPI distribution tarball and the minimum required versions of each of hwloc, libevent, PMIx, and PRRTE. Pay special attention to the corner cases of building PMIx (internal and external), with and without PRRTE. Also set the minimum required versions for PMIx and PRRTE in VERSIONS: * Testing shows that Open MPI v5.0.x requires at least PMIx v4.2.0 * The minimum required version for PRRTE is v3.0.0, but we recommend in the docs that users use >=v3.0.1 so that they get a full mpirun(1) man page Finally, also show in the docs the versions of the embedded packages (hwloc, libevent, PMIx, and PRRTE). This required adding a little Python in docs/conf.py to read VERSION files and extract version numbers from tarball filenames. Signed-off-by: Jeff Squyres <[email protected]>
9455f22
to
fea96e1
Compare
We had a call yesterday (a few hours before @qkoziol's reply here) and came up with a general sentiment of what we wanted to convey in the docs. I have updated this PR to basically have an all-new commit with new verbiage and explanations in the docs, as well as adding a listing of the versions of the embedded packages (hwloc, libevent, PRRTE, OpenPMIx). Reviewers: Please have a look at how it all looks in the docs now: https://ompi--11875.org.readthedocs.build/en/11875/installing-open-mpi/required-support-libraries.html. Keep in mind that this PR is on Also, we slightly changed the minimum PMIx and PRRTE versions in the
However, in checking with @rhc54, it looks like he's done some more testing, and PMIx 4.2.0 is not sufficient. And possibly 4.2.2 is not sufficient, either. Specifically, the MCA conversion code that was ported over to OMPI recently for use in direct launch did not exist in PMIx until pretty recently. We need to decide what we want to do about that. |
LGTM |
EDIT by @jsquyres: This PR has changed considerably since it was originally filed. Below is the new commit message, which describes what this PR now is.
docs: Better explain required versions of support libraries
Get rid of the table that listed the required support library versions -- the table format was too limiting. Instead, add a bunch more verbiage about the versions that are included in the Open MPI distribution tarball and the minimum required versions of each of hwloc, libevent, PMIx, and PRRTE. Pay special attention to the corner cases of building PMIx (internal and external), with and without PRRTE.
Also set the minimum required versions for PMIx and PRRTE in VERSIONS:
Finally, also show in the docs the versions of the embedded packages (hwloc, libevent, PMIx, and PRRTE). This required adding a little Python in docs/conf.py to read VERSION files and extract version numbers from tarball filenames.
Fixes #10657