Skip to content
rhc54 edited this page Jun 24, 2015 · 72 revisions

June 2015 OMPI Developer's Meeting

This is a standalone meeting; it is not being held in conjunction with an MPI Forum meeting.

Logistics

Doodle for choosing the date: http://doodle.com/4arc4ciiby2ve222

  • Date: 9am Tuesday, June 23 through 3pm Thursday, June 25
  • Location:
    • Cisco Building 1, 3850 Zanker Road, San Jose, California 95134
    • Tuesday: Mt. Everest conference room
    • Wednesday: Mt. Everest conference room
    • Thursday: Mt. Aetna conference room

Link to January meeting wiki notes.

Attendees

Local attendees:

  1. Jeff Squyres - Cisco
  2. Ralph Castain - Intel
  3. Nathan Hjelm - LANL
  4. Dave Solt - IBM
  5. Dave Goodell - Cisco (may not be there for the whole meeting)
  6. Howard Pritchard - LANL
  7. Shinji Sumimoto - Fujitsu
  8. George Bosilca - UTK
  9. Devendar Bureddy - Mellanox
  10. Jithin Jose - Intel
  11. Yohann Burette - Intel
  12. Rolf vandeVaart - NVIDIA
  13. Andrew Friedley - Intel

Topics to discuss

  • Review action items from Jan meeting
  • PMIx integration and API extension for PMIx v2.0
  • "Instant On" status and planning
    • Async add procs
    • Direct modex support
    • Distributed mapping
  • Thread multiple support status
  • ORTE process name change - replace jobid with namespace
  • Coverity update
  • Plan for v2.x

Results

  • Version number scheme / roadmap announcement to users
    • Jeff's slides as pptx
    • Jeff's slides as PDF
    • Endless discussion - slides cleaned up. Major decisions were to revise definition of backward compatibility to be "binary compatible + CLI + MCA params"
    • v1.10.0 will not meet this definition relative to v1.8 series
    • NEWS will contain list of CLI and MCA param changes
  • Plan for v1.10.x
    • Feature-complete once PSM2 PR is committed
    • Mellanox may have some PowerPC contributions, more as bug-fixes/optimizations
  • MPI 3.1 - are we ready?
    • Jeff is blocker - Fortran changes
    • Edgar - need to know plans for non-blocking IO
    • Nathan - looking at MPICH Generalized request code to see if we can bring it into OMPI to enable ROMIO 3.1 support
    • Howard has wiki page tracking MPI-3 compliance - reviewed all the 3.0 errata and 3.1 outstanding tickets and assigned them to people
  • Open MPI legacy code: can we chop off support for some older systems?
    • E.g., can we force users to use compilers with <stdbool.h>? (has implications in opal_config/opal_config_bottom.h)
    • C99 requires stdbool.h exist, so this is stale and can be removed
    • Nathan points out that other headers and types are in this category, so we need to scrub the entire configure system to remove extraneous header and type checks (full list TBD)
  • Git / github usage. How's it going? What's going well / not well? What can we improve on?
    • One idea: should everything on master be a pull request (just to get smoke testing on a variety of systems)? NOTE: NOT advocating a code czar -- anyone can still push the merge button.
    • Jenkins usage on PRs.
    • LANL work on a Jenkins aggregator for Github.
    • Other Github webhooks that might be useful?
    • Anything else we want to tweak?
    • Cisco et al will work on completing the Jenkins aggregator project and improving both throughput and reliability of the service
    • We strongly encourage developers to use PR's to bring changes into the master. Once the testing support has been improved, we may change this policy to a requirement
    • We will look at the possibility of pulling all outstanding master PR's into an integrated tarball and running it thru MTT on a nightly basis
  • coll ml discussion - get rid of this completely?
  • Re-introducing Microsoft Windows support (http://herbsutter.com/2012/05/03/reader-qa-what-about-vc-and-c99/)
    • IBM to look at what would be required
    • Requested that IBM provide a Jenkins-like tester so we can know when we break it
    • Probably want a handcoded .project file, no-build (either no Cmake file or other way) components that cannot work under Windows
    • Windows support removed at open-mpi/ompi@a4b6fb241fe0bdf082431e8a380c1a1ab8b25799
  • collectives and CID allocation
    • George will push a commit to improve CID allocation algorithm
    • George is going to look at reviving the hier coll component and compare its performance to coll/ml
  • Fujitsu Development Status and Some Topics towards Next MPI development
  • revisit mtl one-sided support
    • Decided that we will extend the MTL interface to add one-sided APIs
    • Nathan will provide an RFC of the revised APIs
  • libfabric support
    • getting used in multiple places within the code
    • will add opal/mca/common/libfabric to centralize some of the functions
  • Can we add new environment variable CUDA_AWARE_SUPPORT and also create info key on MPI_COMM_WORLD for runtime detection?
    • use MPI_T to access the control variable which is read-only
    • add the "macro" as an extension, need to work the configure logic so it gets built whenever --enable-cuda is specified
    • Ralph volunteered to help Rolf out by creating the extension directory and creating the required configure logic

Presentation Material

  • ...fill in content here
Clone this wiki locally