forked from open-mpi/ompi
-
Notifications
You must be signed in to change notification settings - Fork 1
WeeklyTelcon_20190212
Geoffrey Paulsen edited this page Mar 12, 2019
·
3 revisions
- Dialup Info: (Do not post to public mailing list or public wiki)
- Geoff Paulsen
- Jeff Squyres
- Brian Barrett
- Geoffroy Vallee
- Josh Hursey
- Matias Cabral
- Ralph Castain
- Thomas Naughton
- Todd Kordenbrock
- Xin Zhao
- David Bernholdt
- Matthew Dosanjh
- George
- Akshay Venkatesh
- Edgar Gabriel
- Howard Pritchard
- Josh Hursey
- Aravind Gopalakrishnan (Intel)
- Joshua Ladd
- Nathan Hjelm
- Dan Topa (LANL)
- Akshay Venkatesh (nVidia)
- Arm (UTK)
- Peter Gottesman (Cisco)
- mohan
- The HostGator web site (open-mpi.org) is coming up for renewal. We need to decide what we are going to do about it
- Expires in Summer (Start in May) Expires July 27th.
- Need to move domain names. (Who owns that?)
- It'd be nice to move to AWS.
- DNS should be owned by SPI. Still need to transfer that.
- Topic for April.
- Nathan Hjelm's day job will no longer involve Open MPI, so if you want him to review something, please check with him first.
- Next face to face is San Jose - April 23-April25 @ Cisco -San Jose.
Review All Open Blockers
Review v3.0.x Milestones v3.0.3
- Merging PRs this morniung
- Create RC tomorrow.
- Consider disabling pmix-new-shmem mca param. (see PMIx Issue 1114)
- Should resolve https://github.com/open-mpi/ompi/issues/6198 before releasing
Review v3.1.x Milestones v3.1.0
- Merging PRs this morning
- Create RC tomorrow.
- Consider disabling pmix-new-shmem mca param. (see PMIx Issue 1114)
- Should resolve https://github.com/open-mpi/ompi/issues/6198 before releasing
Review v4.0.x Milestones v4.0.1
- Schedule: waiting for Issue6278 fix
- v4.0.0
- Consider disabling pmix-new-shmem mca param. (see PMIx Issue 1114)
- Adding OSHMEM API - bugfix. Need to rev .so versions correctly
- Serious issue https://github.com/open-mpi/ompi/issues/6198, but won't hold v4.0.1
- OOB version checking - discussed in a meeting in dec, but didn't implement anything.
- An issue for certain container models (mpirun outside, mpids inside Docker model)
- Cross compatibility issue between versions because of OOB selection logic.
- Some chatter on this PR about how to deal with.
- Could do something for v4.0.1
- https://github.com/open-mpi/ompi/pull/6157
- Could just set the "major" version to 4 for OOB protocol.
- Probably don't need to worry about mpirun-oob trying to talk to orted-oob.
- ssh launches the docker container, and sets env var to container, so mpid has a way to connect back to mpirun
- Think we'll merge this patch. And then File an issue on Master to make
this OOB is backwards compatible with OMPI v4.x.
And then we have the same risk profile as today. - PMIx handles this compatibilty issue, and detects versions inside and outside and adjust
- We can not guarantee this use case, but we can say we do a "Best Effort" it's not just OOB, also OPAL Datatypes, messages we send, etc. A whole bunch
- If we start testing, people will expect we will fix if it breaks.
- Schedule: Delaying post Summer ***
- Discussion of schedule depends on scope discussion
- if we want to separate Orte out for that? Would be a bit past summer.
- Giles has a prototype of PRTE replacing ORTE
- Want to open up release-manager elections.
- Now that we're delaying, will decide at face2face.
- Is anyone pushing for a Summer of 2019 schedule?
- It seems too aggressive to everyone on the call
- One driver was to remove things to break ABI.
- Not a bad idea to DO v5.0, but summer timing is bad.
- Delaying would allow for switching to PRTE.
- PMIx Tools support
- Now the possibility of v4.1 from master is a possibility
- If we instead do a v4.1, some things we'd need fixed on master.
- will discuss more at face to face.
- New Alert in PMIx side PMIx Issue 1114. - wrong answer in shared memory component.
- Should disable new shared memory segment in PMIx until resolved.
- Considering adding mca param to disable in v3.1 (internal and external), in v3.0 (external, as internal probably not in that PMIx version)
- PMIX direct call / PRTE replacement for ORTE.
- Howard has been changing OMPI or OPAL places that call the PMIx framework,
- to use PMIx data structures directly in the code.
- Doesn't look like Howard would step on Ralph's toes.
- March 4th is next MPI Forum (then June)
- We have a new open-mpi SLACK channel for Open MPI developers.
- Not for users, just developers...
- email Jeff If you're interested in being added.
- how do we get more participation, and make MTT more meaningful
Review Master Master Pull Requests
- didn't discuss today.
Review Master MTT testing
- Mellanox, Sandia, Intel
- LANL, Houston, IBM, Fujitsu
- Amazon,
- Cisco, ORNL, UTK, NVIDIA