forked from open-mpi/ompi
-
Notifications
You must be signed in to change notification settings - Fork 1
WeeklyTelcon_20180417
Geoffrey Paulsen edited this page Jan 15, 2019
·
1 revision
- Dialup Info: (Do not post to public mailing list or public wiki)
- Geoff Paulsen
- Jeff Squyrese
- Josh Hursey
- Brian
- David Bernholdt
- Edgar Gabriel
- Howard Pritchard
- Nathan Hjelm
- Thomas Naughton
- Todd Kordenbrock
- Xin Zhao
Review All Open Blockers
Review v2.x Milestones v2.1.3
- v2.1.4 - Targeting Oct 15th,
- Merged in a bunch of stuff.
- One-sided multithreaded bugs that came up.
- Doesn't feel like it's worth it to fix in v2.1.x, so instead pulled configurey changes from v2.0 to v2.1.x
- No new news on v2.1.x
Review v3.0.x Milestones v3.0.2
- v3.0.1 went out the door.
- Oops, Did not get PMIx Compatibility pieces in embedded PMIx
- v3.0.2 open for bugfixes. Quick turnaround on this.
- Shooting for May 1st.
- Will pre-emptively fix PMIx compatibility pieces to pickup PMIx v1.2.5 clients.
- This will bring in PMIx compatibility with OMPI client (mpirun/orted/libmpi) from OMPI v2.1.3
- memkind disable needs to get into v3.0.2, Either taken care of or waiting to be taken care of.
- PR (fix ppc64-big-Endian) can't merger until 4563 is merged.
- Thought Nathan was going to fix the hang, and then merge.
- Given this is the same issue as ARM, where we don't have a block, thought we'd just remove
- We now understand the problem, and not a silent data corruption, just a hang.
Review v3.1.x Milestones v3.1.0
- Schedule - ASAP - but blockers keep getting filed.
- No one seems particularly eager to get it out.
- Not getting any
- blockers
- One is high level of failures in CISCO MTT. Pretty sure it's not unique to 3.1.x, and happening on v3.0.x
- --plm_base_verbose only 'non-default' flag setting.
- Under slurm with mpirun not direct launch.
- Jeff still investigating.
- UCX OSC is failiing in ibm tests on v3.1.x and on master. Geoff will post issue and @xin on it.
- One is high level of failures in CISCO MTT. Pretty sure it's not unique to 3.1.x, and happening on v3.0.x
- Not going to merge in PMIx patch, unless someone says they really want it. Would require a new RC.
Review Master Master Pull Requests
- OSHMEM v1.4 - not sure if we have to drop the depricated APIs, curious OMPI is dropping depricated APIs...
- Only remove things removed from the OSHMEM standard, not things Deprecated as "deprecated" means it will be removed from a future version of the standard. If some APIs were removed from the standard, then ask oshmem email list their thoughts.
- v4.0 release manager
- Howard and Geoff have volunteered, but we can have other volunteers.
- Start talking about now. Plan to branch mid-july.
- Brian has automated much of the manual work, so now much of the work is coordinating with members, and encouraging testing, and fixing, etc.
- If others are interested please volunteer.
- Want to understand PMIx cross version compatibility.
- Jeff Squyres and Josh talked about PMIx cross version, pulled in Howard and Ralph and discussed some more.
- Josh shared a link to a google doc of a matrix of data
- Living Google Doc is: here
-
Tab1: PMIx client to PMIx server
- Note in PMIx >= v2.1.x PMIx added a handshake to see what the clent and server support.
- There were some issues with pmix component dstore that broke some of the handshake in v2.???
- Tab2: Open MPI with External PMIx
- Amazon has compute resources, but not time for test development, etc.
- Trying to get singularity and charly cloud to help with some of this testing, as it impacts them the most.
- Should just need to test PMIx APIs
- PMIx community is continuing to discuss.
- VOLUNEER NEEDED: If time to work on MTT, please volunteer -perl or python or whatever you want.
- Get copy of perl JSON, and put it on MTT.
Review Master MTT testing
- Mellanox, Sandia, Intel
- LANL, Houston, IBM, Fujitsu
- Amazon,
- Cisco, ORNL, UTK, NVIDIA