forked from open-mpi/ompi
-
Notifications
You must be signed in to change notification settings - Fork 1
WeeklyTelcon_20170620
Geoffrey Paulsen edited this page Jan 9, 2018
·
1 revision
- Dialup Info: (Do not post to public mailing list or public wiki)
- Geoff Paulsen
- Artem Polyakov
- Jeff Squyres (Cisco)
- Howard Pritchard
- Josh Hursey
- Mohan
- Murali Emani (LLNL)
- Todd Kordenbrock
- David Bernholdt (ORNL)
- Nathan Hjelm
- Ralph
- Brian Barrett (Amazon)
Review All Open Blockers
- just a few PRs going in.
- 3714 - Does shift signal forwarding need to go to 2.0.x?
- This is an enhancement not bugfix for running under SLURM.
- It's a bug because, if they are using mpirun on SLURM, Scancel won't get the signal.
- It's certainly not a regression.
- LLNL - will handle this with a patch.
- whenever we mess with job termination, it causes issues.
- we'll think about it... no rush for next 2.1.x
-
PR3487 -
- Continue to discuss in PR.
- Looked at timelines for 2.0.x and 2.1.x
- No super critical bugs / bugfixes.
Review Milestones v3.0
- PMIx - PR3696
- IBM will open an issue associated with this.
- When PMIx fixed IBM Load/Store issue, opened a can of issue (memory corruption, alignment in PMIx Lib).
- was hitting some hangs and data corruption.
- Still iterating on. Once that's done, can PR it to v3.x
- Ralph thinks he's got that running cleanly now.
- We want these changes inside of v3.0.
- Cisco tests still having some weird issue in their MTT with
- Leave Session Attached is busted.
- PMIx & SLURM
- in SLURM if you configure (default) you don't get PMIx support.
- in 3.0.x if you launch directly, they all throw an error in MPI_Init().
- Ralph will improve the error message when MPI_Init() can't find a PMIx server.
- Not a blocker, but nice to have.
- Brian is working on Release Template, and will get v3.0 RC out this week.
- Schedule for v3.0 is still end of this month.
- Branch for next release will be End of Face to Face in July.
- Expectations for Folks to test RC.
- Down the road we should make a release tarball each night, and have MTT test THAT nightly.
- Very different in how they're built, until they call 'make dist'.
Review Master Pull Requests
Review Master MTT testing
- Mellanox was having some MTT testing issue, Artem will look at it.
- Mellanox might be seeing it because of deprecated build status stuff.
- Some issues with tests running successfully, but then hangs at the end of output, and dies due to Timeout.
- Right Now PRs, building exactly what the person PRs,
- But could build AFTER a merge of the PR and test THAT.
- IBM has seen internally this method has caught a failure before it was merged to the branch.
- Amazon likes this approach also.
- Have always allowed merging to Master without a PR, but trying to make it more attracted to PR.
- Still test each commit to master, and also
- ompi_scripts/Jenkins - all available, can make changes there.
- Intel is pushing content somewhat regularly, but unclear how much longer.
- Not seeing much benefit.
- Howard - Trying to use it an trying to work on viewer.
- Face2Face Meeting-2017-07
- Date: July 11-13 (9am Tuesday - noon on Thursday.
- Cisco has booked space in Chicago.
- Jeff will see about setting up a Web-Ex for those who are interested.
- Please email him if you are interested in attending via Web-Ex.
- Cisco - Focused on release manager things.
- ORNL - IBM helping with some cluster issue.
- Mellanox, Sandia, Intel
- LANL, Houston, IBM, Fujitsu
- Amazon,
- Cisco, ORNL, UTK, NVIDIA