forked from open-mpi/ompi
-
Notifications
You must be signed in to change notification settings - Fork 1
WeeklyTelcon_20161129
Geoffrey Paulsen edited this page Jan 9, 2018
·
1 revision
- Dialup Info: (Do not post to public mailing list or public wiki)
- Geoff Paulsen IBM
- Jeff Squyres Cisco
- Artem Polyakov Mellanox
- Josh Hursey IBM
- Joshua Ladd Mellanox
- Ralph
- Ryan Grant
- Sylvain Jeaugey
- Howard
- Milestones: https://github.com/open-mpi/ompi-release/milestones/v1.10.5
- 10 open PRs on 1.10.5 - Newly changed in GITHUB - look closely under topic, should say if it's been approved). 2 approved, and 7 review required, and 1 pushed back.
- The ones that are approved are urgent.
- Schedule a release in January of 1.10.5.
- Nathan's looking at a segv in PSM2, but not PSM. He will create issue after reproducing.
- Not the known issue with PSM2 - Something about interrupt handler.
-
Blocker Issues: https://github.com/open-mpi/ompi/issues?utf8=%E2%9C%93&q=is%3Aopen+milestone%3Av2.0.0+label%3Ablocker
-
Known / ongoing issues to discuss
- STAT Debugger: PR #2411.
- Ralph added 2 more commits to his fork, but need LLNL to test (they're out for 1 week).
- Not a blocker for 2.0.x (IBM can pull directly into Spectrum MPI).
- Any other blockers for 2.0.2?
- blocker: HColl Context Free (PR on 1.10.5, but Mellanox will PR to 2.0.x in next 2 days)
- Coll_Lib_NBC - need george's review. Adds thread protection for opal_lists. Josh says that George isn't sure if it's complete.
- PR 2461 - in 2.x
- If people are not testing with Async modex + _____, maybe they should.
- for libraries that want all endpoints in Init, using PMIx_Dstore shows 15% improvement.
- STAT Debugger: PR #2411.
-
Schedule - Looking for release of 2.0.2 end of week. If everything goes well.
-
PMIx update
- putting job data in the shared memory dstore.
- PR for this, shows memory improvements.
- Seeing some performance problems on Power Arch. dstore is actually showing degradation.
- next week would be earliest for possible RC.
-
OMPI 2.1
- THE blocking issue is PMIx.
- The BSD patcher - Nathan's been asked to work on it. Graceful fail is fine.
-
Milestones: https://github.com/open-mpi/ompi-release/milestones/v2.0.0 *
Review Master MTT testing (https://mtt.open-mpi.org/)
- No morning messages still. Need to pester Brian about. Apparently not allowed to make changes until after the new year.
- mail from our AWS instance is not getting to us.
- Biggest failures we saw in 2.0.x and 2.1.x
- OSHMEM - BTL fix, fixed a bunch of things, but still a few errors (Segv), Put or Get not registered location.
- Jeff will make a ticket for few remaining OSHMEM failures.
- OSHMEM - BTL fix, fixed a bunch of things, but still a few errors (Segv), Put or Get not registered location.
- Sylvain seeing a bunch of errors in master oob/ud components
- mostly timeouts. not sure if hanging, or really slow.
- Josh - turned on Jenkins testing at IBM, may result in timeouts. Using PGI on PPC64.
-
Put up a PR for combinatorial executor. Still a bug in submitter.
-
Telcom tomorrow.
-
Face to Face in January - https://github.com/open-mpi/ompi/wiki/Meeting-2017-01
-
SC BOF
- Should we do 2.2 or 3.0? Poll to the community.
- 87% said go for 3.0.
- Went way too long
- Bad time slot (not sure why), since we only had half of people we normally do.
- Should we do 2.2 or 3.0? Poll to the community.
-
PMIx update - Decided to do a PMIx 2.0 release (what was going to be PMIx 3.0) - January time frame.
-
libevent update - they have put out an RC for 2.1.7 (OMPI 2.x is on libevent 2.0)
- 2 years of code changes, though most are not in our usage path.
- Still some, somewhat scarey changes in main path, so need to test well. evaluate before adding to OMPI 2.x
- There is an external component for libevent, so there is that option.
- Cisco, ORNL, UTK, NVIDIA
- Mellanox, Sandia, Intel
- LANL, Houston, IBM