forked from open-mpi/ompi
-
Notifications
You must be signed in to change notification settings - Fork 1
WeeklyTelcon_20170327
Geoffrey Paulsen edited this page Jan 9, 2018
·
1 revision
- Dialup Info: (Do not post to public mailing list or public wiki)
- Geoff Paulsen
- Jeff Squyres
- Brian Barrett
- Geoffroy Vallee (ORNL)
- David Bernholdt (ORNL)
- Howard
- Josh Hursey
- Joshua Ladd
- Ralph
- Nathan Hjelm
Review All Open Blockers
- No news is good news.
- A couple of pull requests that are in push-back state.
- reviewer wanted something, or just not done.
- Don't have a time-line, but will need one.
- April 8th sounds like a good arbitrary date.
Review Milestones v2.1.x
- Howard pushed some PRs into v2.x (after v2.1.0 released)
- Did not create a v2.1.x branch (v2.1.0 branched off of
- Question: Can vendors backport features to 2.x branch and maintain it as a vendor specific branch?
- We haven't done this in the past, but we could talk about it.
- Some pushback about this.
- Maybe the question is, should there be a v2.2 ?
- On v1.10 we allowed PRs against the branch, even though there was no intention to release.
- We haven't done this in the past, but we could talk about it.
- Some not in favor of v2.2, since we now have 4 month cycle for branching.
- v3.0 has already diverted so much from v2.1
- Takes a lot of testing to retest.
- Mellanox is not interested in releasing a v2.2, but is interesting in branching v2.1 out, and then having a v2.x that they can push to.
- Two options:
- Could make a branch with no goal of releasing. Somewhat unclear to customers that Open MPI wouldn't release.
- If we make a branch for v2.2 and try to have a release, that seems like a lot of extra testing on top of our 4 month release schedule.
- If we create the v2.2 branch, and no one else chooses to push there, how does that affect others?
- If we have a branch in OMPI tree, what do we need?
- If we have a release branch, we need a release manager, and CI testing, etc.
- SO what is after 3.0? Is it a 3.1?
- If we want new OSHMEM changes, it needs to be new major version change.
- What we discussed was do a release every 4 months, (either major or minor). What determines if it's major or minor is based on back-wards compatibility.
- If there are changes of backwards compatibility, it will be 4.0, if not, it will be 3.1.
- Much confusion about where to target new features and bugfixes, etc.
- A lot of discussion about stabilizing.
- branch on a date / release by a date - needed for regularly scheduled releases.
- If someone wants to push new minor features / bugfixes on a release branch.
- Should have some level of CI testing, and a release manager.
- IBM described their current process, and how painful it is to maintain.
- Changes downstream without getting them upstream first.
- For Vendors if we keep the upstream simple and date-based.
- Time based release gives suborganizations a little freedom to help plan on.
- Solid timeline seems like a really good idea for roadmap planning.
- Bumps from transitioning to new system.
- Cisco is 100% upstream. Just got work from a year ago from upstream.
- Hasn't always been, and might not be in the future.
- If vendor could have absolute confidence of the release time of a release.
- We are doing two things to try to hit the v3.0 schedule:
- Whitelisted a few items on v3.0, and been firm on that.
- said no to v2.2
- had to end the conversation early due to time.
- We branched for v3.x - So don't forget to PR over to v3.x when PRing over to v2.x
- Everything is off the whitelist, except PMIx.
- PMIx - reason we're doing an accelerated v3.0
- Discussed PMIx PR 3194 - got pushed to v3.1
- Whitelist Issue 3107
- UCX got in.
- Ralph working on job control monitoring RFC.
- Just finishing integration of this.
- And Only other major piece is the messaging compatibility piece.
- Still on track.
- No status this week.
Review Master Pull Requests
Review Master MTT testing
- Looking pretty good.
- Nvidia cluster seems to have something wrong, most unusual errors.
- Thoughts about removing Travis?
- Can turn off MAC parts of Travis, but nervous to turn off other Travis until after AWS is online.
- Should do a comparison of coverage of Travis and what others are testing.
- Howard will remove Mac OS testing from Travis... we'll keep the rest in Travis for now.
- We should begin thinking about scheduling our next face to face.
- Geoff will put out doodle for June and July and begin to nail down a schedule.
- Cisco has a site in Chicago.
- Prefer not-last week of July.
- Dallas, San Jose, Seattle
- Cisco, IBM, ORNL, UTK, NVIDIA, Amazon
- IBM - working Spectrum MPI based on v2.0.2 in field, working well.
- ORNL setup mtt to do some testing
- Amazon - build scripts for PMIx / hwloc - Some news about scripts not cleaning up correctly after themselves.
- Want to start using s3 instead of gatorhost for storing nightly tarballs.
- Also, eventually all tarballs will move out of ompi repo, so our main repo will be small again.
- Release work - takes time.
- Trying to hire staff.
- Want to start using s3 instead of gatorhost for storing nightly tarballs.
- Mellanox, Sandia, Intelm
- LANL, Houston, IBM, Fujitsu