forked from open-mpi/ompi
-
Notifications
You must be signed in to change notification settings - Fork 1
WeeklyTelcon_20170627
Geoffrey Paulsen edited this page Jan 9, 2018
·
1 revision
- Dialup Info: (Do not post to public mailing list or public wiki)
- Geoff Paulsen
- Jeff Squyres (Cisco)
- Howard Pritchard
- Josh Hursey
- Todd Kordenbrock
- David Bernholdt (ORNL)
- Nathan Hjelm
- Ralph
- Brian Barrett (Amazon)
- Artem
Review All Open Blockers
- Targeting next 2.0.x October.
- Targeting next 2.1.x mid-August.
Review Milestones v3.0
- No RC last week.
- Going to merge in last couple of Ralph's PMIx changes.
- Josh Hursey needs to review PR 3754.
- Will create an Open MPI v3.0.0 RC1 today.
- Focus of RC1 testing will be around Orte launching.
- Some orteds are still getting killed sometimes.
- Some complaints in killing changes
- Larger picture schedule for v3.0?
- Like to get feedback on RC1.
- Haven't had a lot of testing on v3.0 branch now.
- There are a bunch of MPI layer PRs (some are review required)
- two PRs
- ROMIO PR (requires REVIEW)
- RDMA PR (requires REVIEW)
- Any special features for NEWS? Only responses from Mellanox.
- MTT Cisco turned off Leave Session Attached is busted.
- IBM added some MPI dependencies in OPAL layer, but no CI caught it.
- autogen.pl -nompi and some other flag, would catch some abstraction layer violations like this.
- Branch for next release will be End of Face to Face in July.
- Expectations for Folks to test RC.
- Down the road we should make a release tarball each night, and have MTT test THAT nightly.
- Very different in how they're built, until they call 'make dist'.
Review Master Pull Requests
- Some corruption in Cray PMIX component on Master, about a week ago.
- Monitoring components - replaces ptraces stuff. Some segv in this.
- Don't think they're supposed to be on by default. Possibly bug in GLUE.
Review Master MTT testing
- Mellanox was having some MTT testing issue, Artem will look at it.
- Mellanox might be seeing it because of deprecated build status stuff.
- Some issues with tests running successfully, but then hangs at the end of output, and dies due to Timeout.
- Right Now PRs, building exactly what the person PRs,
- But could build AFTER a merge of the PR and test THAT.
- IBM has seen internally this method has caught a failure before it was merged to the branch.
- Amazon likes this approach also.
- Intel is pushing content somewhat regularly, but unclear how much longer.
- Not seeing much benefit.
- Howard - Trying to use it an trying to work on viewer.
- Face2Face Meeting-2017-07
- Date: July 11-13 (9am Tuesday - noon on Thursday.
- Cisco has booked space in Chicago.
- Jeff will see about setting up a Web-Ex for those who are interested.
- Please email him if you are interested in attending via Web-Ex.
- No Fees at this face to face.
- From mailing list (From SuSE) - Reproducability of the build.
- Whatever build you want to be able to binary compare to see if it's the same, but can't because of date.
- Lots of pros / cons to having date in build.
- Put it in ompi_info - build host, build date, Manpages (stamped at make dist).
- maybe add some DATE env to force the date for post v3.0
- dlopen LOCAL is painful - Issue 3705
- each mca library should be linked against libraries they have actual dependencies
- We used to link the components against the libraries, but then we stopped.
- Jeff Recalls: But then we stopped because we'd link MPI components against both MPI and ORTE.
- Jeff Recalls: But if you do an upgrade, then you're screwed...
- Brian Recalls: OSX namespacing issue...
- need to do some archeology
- Ralph remembers there was SOME reason we don't do this linkage.
- Not for v3.0 - on Face 2 Face discussion.
- Maybe add a configure option to do this.
- For v4.0 do we want to keep hwloc internal, or just use external?
- Compromise would be to change precedent to use external over internal for all of our libs?
- Then in a future release, remove internals (or some at least) completely?
- RHEL5 doesn't have hwloc.
- Fixed something that now allows Open MPI to use older hwloc 1.3, 1.4, 1.5 or something, but still not v1.0.
- Compromise would be to change precedent to use external over internal for all of our libs?
- What to do about libevent? - look at all of them at face to face.
- Mellanox, Sandia, Intel
- LANL, Houston, IBM, Fujitsu
- Amazon,
- Cisco, ORNL, UTK, NVIDIA