Skip to content
Jeff Squyres edited this page Jan 29, 2015 · 93 revisions

January 2015 OMPI Developer's Meeting

This is a standalone meeting; it is not being held in conjunction with an MPI Forum meeting.

Logistics

Doodle for choosing the date: https://doodle.com/zzaupgxge9y6medu

  • Date: 9am Tuesday, January 27 through 3pm Thursday, January 29, 2015
  • Location: Cisco Richardson facility (outside Dallas), building 4:

Cisco Building 4
2200 East President George Bush Highway
Richardson, Texas 75082-3550

Google maps link: https://goo.gl/maps/SNrbu

Attendees

Local attendees:

  • (*) Jeff Squyres - Cisco
  • (*) Howard Pritchard - Los Alamos
  • (*) Ralph Castain - Intel
  • (*) George Bosilca - U. Tennessee, Knoxville
  • (*) Dave Goodell - Cisco
  • (*) Edgar Gabriel - U. Houston
  • (*) Vish Venkatesan (not Tuesday) - Intel
  • (*) Geoff Paulsen - IBM
  • (*) Joshua Ladd - Mellanox Technologies
  • (*) Rayaz Jagani - IBM
  • (*) Dave Solt - IBM
  • (*) Perry Schmidt - IBM
  • (*) Naoyuki Shida - Fujitsu
  • (*) Shinji Sumimoto - Fujitsu
  • (*) Stan Graves - IBM
  • (*) Mark Allen - IBM
  • ...please add your name if you plan to attend...

(*) = Registered (by Jeff)

Remote attendees

  • Nathan Hjelm - Los Alamos
  • Ryan Grant - Sandia (planning to attend for the MTL and 1.9 branch discussions)

Topics still to discuss

Wed afternoon (in priority order)

Thurs morning

  • Ralph: ORCM update
    • Roadmap
    • Instant On launch planning
  • Jeff: Progress on thread-multiple support
  • Ralph: Collective switching points & MPI tuning params - what is required to change them. Had a discussion brought up by Mellanox, and we never finished this.
  • Intel/LANL: MTL selection issue (PSM vs. OFI)
  • Nathan: Enhance MTL interface to include one-sided and atomics

Deferred

  • Ralph: RTE-MPI sharing of BTLs

Since this will be a full meeting in itself, we'll have a good amount of time for discussion, design, and for hacking!

Resolved

  • Jeff/Howard: Branch for v1.9

    • See Releasev19 wiki page
    • We need to make a list of features for v1.9.0 to see if we're ready to branch yet
  • Jeff: libtool 2.4.4 bug / libltdl may no longer be embeddable. Should we embed manually, or should we just tell people to have libltdl-devel installed?

    • Resolved: let's stop embedding; we'll always link against external libltdl.
    • However: this means people need to have the libltdl headers installed (e.g., libltdl-devel RPM). We don't care about telling developers to do this, but we are a little worried about telling users to do this (because it raises the bar for building Open MPI -- the assumption that libltldl-devel is almost certainly not installed on most user machines).
    • The question becomes: what is configure's default behavior when it can't find ltdl.h?
      1. Abort
      2. Just fall back to --disable-dlopen behavior (i.e., slurp in plugins)
    • Let's bring up the "default behavior" issue as an RFC / beer discussion.
  • Jeff/Howard: Jenkins integration with Github:

    • how do we do multiple Jenkins servers? (e.g., running at different organizations)
    • much discussion in the room. Seems like a good idea to have multiple Jenkins polling github and running their own smoke tests. Need to figure out how to have them report results. Mike Dubman/Eugene V/Dave G will go investigate how to do this.
  • Howard/George: fate of coll ML

  • see http://www.open-mpi.org/community/lists/devel/2015/01/16820.php

  • who owns it?

  • should we try to fix it or disable by default?

  • Point was raised that coll/ml is very expensive during communicator creation -- including MPI_COMM_WORLD. Should we delete coll/ml? George asked Pasha; Pasha is checking.

  • Pasha: disable it for now, ORNL will fix and re-enable

  • DONE: George opal_ignore'd the coll/ml component

  • Ralph: Scalable startup, including:

    • Current state of opal_pmix integration
    • Async modex, static endpoint support
    • Re-define the role of PML/BTL add_procs: need to move to a more lazy-based setup of peers
    • Memory footprint reduction
    • Resolved:
    • Revive sparse groups
      • Edgar checked: passes smoke test today
      • first phase: replace ompi_proc_t array with pointer array to ompi_proc_t's
        • investigate further reduction in footprint
          • very simple, 1-way static setup of group hash, current optimize for MCW
    • remove add_procs from MPI_Init unless preconnect called
      • PML calls add_procs with 1 proc on first send to peer
        • need centralized method to check if we need to make a proc (must be thread safe)
        • may need to poll BTLs...etc. Expensive! Async? Must also be done thread safe
        • still a blocking call
        • Nathan: if one-sided calls BTLs directly, then need to check/call add_procs
      • call add_procs with all procs for preconnect-all and in connect/accept, or if PML component indicates it needs to add_procs with all procs
      • need to check with MTL owners on impact to them
      • will only add_procs a peer proc at most once before it is del_proc'd
    • del_procs needs to release memory and NULL the proc entry to ensure that you get NULL when you next look for the proc
    • differentiate between "I need a proc for..."
      • communication
      • non-communication
    • need to check BTL/MTLs to see how they handle messages from peers that we don't have an ompi_proc_t for
      • need way for BTL/MTL to upcall the PML with the message so the PML can create a new ompi_proc_t, call add_proc, handle message
  • COMM_SPLIT_TYPE PR: https://github.com/open-mpi/ompi/pull/326 -- what about IP issues?

  • Jeff added request to PR that the author mark it as released as BSD so we can properly ingest it

  • George to contact offlist to discuss enhancements

  • Edgar: extracting libnbc core from the collective component into a standalone directory such that it can be used from OMPIO and other locations

    • move the libnbc core portions into a subdirectory in ompi
    • modification to libnbc will include new read/write primitives as well as new send/recv primitives with an additional indirection level for buffer pointers.
  • Ralph: Review: v1.8 series / RM experience with Github and Jenkins and the release process

    • Ralph's feedback: lots more PRs than we used to have CMRs
    • Ralph's feedback: people seem to be relying on Jenkins for correctness, when Jenkins is really just a smoke test
    • Github fans will look at creating some helpful scrips to support MTT testing of PRs
  • Ralph: PMIx update

    • Given orally at meeting
  • Ralph: Data passing down to OPAL

    • Revising process naming scheme
    • MPI_Info
      • OPAL_info (renamed) object and typedef it at the OMPI layer
        • Dave Salt from IBM volunteered
    • Error response propagation (e.g., BTL error propagation up from OPAL into ORTE and OMPI, particularly in the presence of async progress).
      • Create opal_errhandler registration, call that function with errcode and remote process involved (if applicable) when encountering error that cannot be propagated upward (e.g., async progress thread)
        • Ralph will move the orte_event_base + progress thread down to OPAL
        • Ralph will provide opal_errhandler registration and callback mechanism
        • Ralph will integrate the pmix progress thread to the OPAL one
        • opal_event_base priority reservations:
          • error handler (top)
          • next 4 levels for BTLs
          • lowest 3 levels for ORTE/RTE
  • Howard: Progress on async progress

  • Nathan: --disable-smp-locks: remove this option?

    • See RFC email http://www.open-mpi.org/community/lists/devel/2015/01/16736.php
    • See, in particular, George's replies
    • In short: atomics are only used when multi-threading is enabled. But sm and vader need the smp locks.
    • However, people are discovering --disable-smp-locks, but this breaks sm/vader.
    • OMPI atomic functions:
      • CAPS versions: only enabled when opal_using_threads() is true, which is only true when set_opal_using_thread(true), which is only when we are MPI_THREAD_MULTIPLE
      • lower_case version: only on when --enable-smp-locks
    • George misunderstood the issue. Now he understands and agrees with Nathan: remove the --enable-smp-locks option.
  • Nathan: Performance of freelists and other common OPAL classes with OPAL_ENABLE_MULTI_THREADS==1 (as discussed in [GitHub]). Part of this is done already -- LIFO is a bit faster now (with threads), etc.

    • This is pretty much already resolved (after this item was added to the agenda) -- a fix went in on master for this, and a different fix went in for v1.8.
    • So the issue is now moot. Yay!
  • Vish: Memkind integration: see http://www.open-mpi.org/community/lists/devel/2014/11/16320.php

    • Vish has slides that he will post here.
    • We all generally agree that memkind introduces some new, desirable functionality
    • With some discussion in the room, it seems "easy" to to add this functionality to MPI_ALLOC_MEM/MPI_FREE_MEM.
    • We decided that it's quite hard to know how to use this internally in the rest of the OMPI code base right now. We assume we will want to use it; we just don't know how yet (there are many variables). So let's get some experience with memkind in MPI_ALLOC_MEM first and revisit how to use this internally in the rest of the code base.
    • Here's the 4 steps we think we need to do:
      1. remove "allocator" framework use from ob1, replace it with malloc (because the use of allocator there seems to be pretty useless)
      2. create new allocator modules for things like:
        • posix_memalign
        • mmap
        • malloc
        • ...?
      3. change the mpool framework/modules to use allocator modules to get memory
      4. update MPI_Alloc_mem to:
        • lazily create allocator modules from memkind when each memkind type requested
        • make an mpool with that allocator
        • allocate memory from the mpool associated with that memkind allocator type
        • (somehow) register the memory with all other mpools (e.g., mpools in use by the BTLs)
        • MPI_FREE_MEM needs to unregister with all mpools (probably already done?)
        • MPI_FREE_MEM needs to return the memory to the right mpool
    • Nathan and Vish will coordinate to move forward on this.
    • George and NAthan are digging in to ensure that allocator is not already being used in a way that will be problematic. ob1 usage seems to be understood / ok to change. sm mpool needs to be investigated -- it uses allocator, too.
  • Fujitsu: future plans for Open MPI development

    • Shinji will post slides here.
  • Ralph/Nathan: MTL overhead reduction

    • ...more...
  • Jeff: MPI extensions: MPIX_ prefix, or OMPI_ prefix?

    • Just a discussion between Jeff and George.

Presentation Material

Clone this wiki locally