Skip to content
Howard Pritchard edited this page Jan 28, 2015 · 93 revisions

January 2015 OMPI Developer's Meeting

This is a standalone meeting; it is not being held in conjunction with an MPI Forum meeting.

Logistics

Doodle for choosing the date: https://doodle.com/zzaupgxge9y6medu

  • Date: 9am Tuesday, January 27 through 3pm Thursday, January 29, 2015
  • Location: Cisco Richardson facility (outside Dallas), building 4:

Cisco Building 4
2200 East President George Bush Highway
Richardson, Texas 75082-3550

Google maps link: https://goo.gl/maps/SNrbu

Attendees

Local attendees:

  • (*) Jeff Squyres - Cisco
  • (*) Howard Pritchard - Los Alamos
  • (*) Ralph Castain - Intel
  • (*) George Bosilca - U. Tennessee, Knoxville
  • (*) Dave Goodell - Cisco
  • (*) Edgar Gabriel - U. Houston
  • (*) Vish Venkatesan (not Tuesday) - Intel
  • (*) Geoff Paulsen - IBM
  • (*) Joshua Ladd - Mellanox Technologies
  • (*) Rayaz Jagani - IBM
  • (*) Dave Solt - IBM
  • (*) Perry Schmidt - IBM
  • (*) Naoyuki Shida - Fujitsu
  • (*) Shinji Sumimoto - Fujitsu
  • (*) Stan Graves - IBM
  • (*) Mark Allen - IBM
  • ...please add your name if you plan to attend...

(*) = Registered (by Jeff)

Remote attendees

  • Nathan Hjelm - Los Alamos
  • Ryan Grant - Sandia (planning to attend for the MTL and 1.9 branch discussions)

Topics still to discuss

Wed afternoon (in priority order)

  • Vish: Memkind integration: see http://www.open-mpi.org/community/lists/devel/2014/11/16320.php
  • Fujitsu: future plans for Open MPI development
  • Jeff: Progress on thread-multiple support
  • Ralph: Collective switching points & MPI tuning params - what is required to change them. Had a discussion brought up by Mellanox, and we never finished this.
  • Intel/LANL: MTL selection issue (PSM vs. OFI)

Thurs morning

To be Scheduled

  • Nathan: Performance of freelists and other common OPAL classes with OPAL_ENABLE_MULTI_THREADS==1 (as discussed in [GitHub]). Part of this is done already -- LIFO is a bit faster now (with threads), etc.
  • Ralph/Nathan: MTL overhead reduction
  • Nathan: Enhance MTL interface to include one-sided and atomics
  • Jeff: MPI extensions: MPIX_ prefix, or OMPI_ prefix?

Deferred

  • Ralph: RTE-MPI sharing of BTLs

Since this will be a full meeting in itself, we'll have a good amount of time for discussion, design, and for hacking!

Resolved

  • Jeff/Howard: Branch for v1.9

    • See Releasev19 wiki page
    • We need to make a list of features for v1.9.0 to see if we're ready to branch yet
  • Jeff: libtool 2.4.4 bug / libltdl may no longer be embeddable. Should we embed manually, or should we just tell people to have libltdl-devel installed?

    • Resolved: let's stop embedding; we'll always link against external libltdl.
    • However: this means people need to have the libltdl headers installed (e.g., libltdl-devel RPM). We don't care about telling developers to do this, but we are a little worried about telling users to do this (because it raises the bar for building Open MPI -- the assumption that libltldl-devel is almost certainly not installed on most user machines).
    • The question becomes: what is configure's default behavior when it can't find ltdl.h?
      1. Abort
      2. Just fall back to --disable-dlopen behavior (i.e., slurp in plugins)
    • Let's bring up the "default behavior" issue as an RFC / beer discussion.
  • Jeff/Howard: Jenkins integration with Github:

    • how do we do multiple Jenkins servers? (e.g., running at different organizations)
    • much discussion in the room. Seems like a good idea to have multiple Jenkins polling github and running their own smoke tests. Need to figure out how to have them report results. Mike Dubman/Eugene V/Dave G will go investigate how to do this.
  • Howard/George: fate of coll ML

  • see http://www.open-mpi.org/community/lists/devel/2015/01/16820.php

  • who owns it?

  • should we try to fix it or disable by default?

  • Point was raised that coll/ml is very expensive during communicator creation -- including MPI_COMM_WORLD. Should we delete coll/ml? George asked Pasha; Pasha is checking.

  • Pasha: disable it for now, ORNL will fix and re-enable

  • DONE: George opal_ignore'd the coll/ml component

  • Ralph: Scalable startup, including:

    • Current state of opal_pmix integration
    • Async modex, static endpoint support
    • Re-define the role of PML/BTL add_procs: need to move to a more lazy-based setup of peers
    • Memory footprint reduction
    • Resolved:
    • Revive sparse groups
      • Edgar checked: passes smoke test today
      • first phase: replace ompi_proc_t array with pointer array to ompi_proc_t's
        • investigate further reduction in footprint
          • very simple, 1-way static setup of group hash, current optimize for MCW
    • remove add_procs from MPI_Init unless preconnect called
      • PML calls add_procs with 1 proc on first send to peer
        • need centralized method to check if we need to make a proc (must be thread safe)
        • may need to poll BTLs...etc. Expensive! Async? Must also be done thread safe
        • still a blocking call
        • Nathan: if one-sided calls BTLs directly, then need to check/call add_procs
      • call add_procs with all procs for preconnect-all and in connect/accept, or if PML component indicates it needs to add_procs with all procs
      • need to check with MTL owners on impact to them
      • will only add_procs a peer proc at most once before it is del_proc'd
    • del_procs needs to release memory and NULL the proc entry to ensure that you get NULL when you next look for the proc
    • differentiate between "I need a proc for..."
      • communication
      • non-communication
    • need to check BTL/MTLs to see how they handle messages from peers that we don't have an ompi_proc_t for
      • need way for BTL/MTL to upcall the PML with the message so the PML can create a new ompi_proc_t, call add_proc, handle message
  • COMM_SPLIT_TYPE PR: https://github.com/open-mpi/ompi/pull/326 -- what about IP issues?

  • Jeff added request to PR that the author mark it as released as BSD so we can properly ingest it

  • George to contact offlist to discuss enhancements

  • Edgar: extracting libnbc core from the collective component into a standalone directory such that it can be used from OMPIO and other locations

    • move the libnbc core portions into a subdirectory in ompi
    • modification to libnbc will include new read/write primitives as well as new send/recv primitives with an additional indirection level for buffer pointers.
  • Ralph: Review: v1.8 series / RM experience with Github and Jenkins and the release process

    • Ralph's feedback: lots more PRs than we used to have CMRs
    • Ralph's feedback: people seem to be relying on Jenkins for correctness, when Jenkins is really just a smoke test
    • Github fans will look at creating some helpful scrips to support MTT testing of PRs
  • Ralph: PMIx update

    • Given orally at meeting
  • Ralph: Data passing down to OPAL

    • Revising process naming scheme
    • MPI_Info
      • OPAL_info (renamed) object and typedef it at the OMPI layer
        • Dave Salt from IBM volunteered
    • Error response propagation (e.g., BTL error propagation up from OPAL into ORTE and OMPI, particularly in the presence of async progress).
      • Create opal_errhandler registration, call that function with errcode and remote process involved (if applicable) when encountering error that cannot be propagated upward (e.g., async progress thread)
        • Ralph will move the orte_event_base + progress thread down to OPAL
        • Ralph will provide opal_errhandler registration and callback mechanism
        • Ralph will integrate the pmix progress thread to the OPAL one
        • opal_event_base priority reservations:
          • error handler (top)
          • next 4 levels for BTLs
          • lowest 3 levels for ORTE/RTE
  • Howard: Progress on async progress

  • What happened to this proposal: http://www.open-mpi.org/community/lists/devel/2014/02/14170.php

    • Ralph will implement a global opal_event_base as part of the error response, as per above

Presentation Material

Clone this wiki locally