forked from open-mpi/ompi
-
Notifications
You must be signed in to change notification settings - Fork 1
WeeklyTelcon_20171219
Geoffrey Paulsen edited this page Jan 9, 2018
·
1 revision
- Dialup Info: (Do not post to public mailing list or public wiki)
- Geoff Paulsen
- Brian
- Howard Pritchard
- Josh Hursey
- Ralph
- Todd Kordenbrock
- Nathan Hjelm
- Thomas Naughton
- Jan 9th:
- Decided last week to push date to late feb or march.
- Discuss abandoning openib btl.
- Want Chelcio and nvidia to be part of discussion.
- Test infrastructure
- Some reliability issues for various jenkins and MTT
- figure out how to deal with on larger context.
- Not sure what to do if someone's jenkins fails your PR.
Review All Open Blockers
Review v2.0.x Milestones v2.0.4
- Nothing New, nothing forcing a new release.
Review v2.x Milestones v2.1.2
- A few PRs coming through.
- Bugfix only mode.
- Launchmond / Alliena attach mode - Issue 3660. This mechanism is part of MPIR,
- Ralph fixed in PR4630 - fix the debugger problem.
- Howard wants Jeff to review this change.
- Need PR4399 only in v2.x branch.
- Need to have "nice" commit message, good enough
- Now can fix arm tests.
- Schedule: January release.
Review v3.0.x Milestones v3.0
- Schedule: Get v3.0.1 out by end of the week.
- Duped issue: Mpool init hang AND Current blocker: Hang on ARM in v3.0.x
- Only hangs in debug. Bad, but not ship-stopper.
- Doesn't happen in optimized mode
-
Issue 4563 - not seeing on little arm boxes here, Jenkins uses --disable-builtin-atomics.
- Because when we disable atomics on powerpc, compiler thinks we have cmp-set128.
- On arm uses old-school lock-based lifo and fifo.
- Fix being worked in PR3988 - bug in PGI compiler
-
Issue 4509 madvise hook
- Jeff and Howard will discuss.
- Now that we hook madvise, we need to be more careful.
- Nathan hopes his PR 4576 on master would reduce the occurances to 0, but need user to verify.
- may have to invalidate a LARGE region, even though it's mostly valide just because glibc invalideded a small part of it.
- Tested PR 4576 in master last week,
- Still need to merge into v2.x, v3.0.x and v3.1.x
- Do we need to Pull PR 4628 into v3.0.x?
- broken in v3.0.0 and later, but it's just launch performance not hang.
- decided NOT to block v3.0.1 for this, and fix this in v3.0.2
Review v3.1.x Milestones v3.1
- SCHEDULE: Like to get out in late January
- For v3.1.x blockers, please insure they have both "target_x" label, and "blocker" label.
- v3.1.0 still has Blocker Issue 4509
- Hope it was fixed in PR 4576 in master tonight, to merge in later.
- Assuming this was fixed (customer didn't reproduce yet)
- Dist Graph Create / Tree Create is still segfaulting - but others can't reproduce.
- happens spurradically.
- Issue 4303
- maybe turn it off by default?
- Component in topo, creates graphs when you create a communicator.
- If you get a reproducer, then update ticket and hand to George.
- Ralph will try to see if he can give George access.
Review Master Master Pull Requests
- rcache GRDM is hitting an assert in Finalize (refcount on object).
- Nathan will look at.
- Seems to be a memory leak in the OMPI Jenkins
- Working on a solution
- Workaround by turnning off pipeline builds.
Review Master MTT testing
- Brian sent an email earlier this week about News file
- Either we make merging painful for developers, or we create a rather large amount of work for release managers.
- Can automate via Pull Request that ends up in the merge.
- block NEWS: whatever you want NEWS to be.
- With metadata, using Pull Requests, then can change that NEWS block after the fact.
- Would happen at make dist time. Public API calls.
- WebEx Schedule: WebEx Next Tuesday Dec19 (unless 0% chance of getting v3.0.1 out)
- Cancel Dec 26,
- Cancel Jan 2nd
- Jeff will create new WebEx URL for 2018.
- See on list email
- Decided last week to push date to late feb or march.
- Mellanox, Sandia, Intel
- LANL, Houston, IBM, Fujitsu
- Amazon,
- Cisco, ORNL, UTK, NVIDIA