Skip to content

Commit

Permalink
Merge pull request #363 from zippylab/auroraMPICH_envVars
Browse files Browse the repository at this point in the history
Add section for Aurora MPICH, including path to README.envvar
  • Loading branch information
cjknight authored Mar 5, 2024
2 parents b51f9aa + 0d97df6 commit 731c28d
Showing 1 changed file with 12 additions and 0 deletions.
12 changes: 12 additions & 0 deletions docs/aurora/running-jobs-aurora.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,18 @@ We recommend against useing `-W tolerate_node_failures=all` in your qsub command

It is important to note that all nodes marked as faulty by PBS will not be used in subsequent jobs. This mechanism only provides you with a means to execute additional mpiexec commands under the same interactive job after manually removing nodes identified as faulty. Once your PBS job has exited, those faulty nodes will remain offline until further intervention by Aurora staff.

## <a name="Aurora-MPICH"></a>Aurora MPICH

The standard version of the MPI (Message Passing Interface) library on Aurora is *Aurora MPICH*. This resulted from a collaboration between Intel and the Argonne MPICH developer team. The `mpiexec` and `mpirun` commands used to launch multi-rank jobs come from the Cray PALS (Parallel Application Launch Service) system.

There are many, many configuration and tuning parameters for Aurora MPICH. Simple ASCII text documentation of the environment variables usable to control behavior is in

```
$MPI_ROOT/share/doc/mpich/README.envvar
```

This includes, for example, settings to select different optional sub-algorithms used in MPI collective operations.

## <a name="Running-MPI+OpenMP-Applications"></a>Running MPI+OpenMP Applications

Once a submitted job is running calculations can be launched on the compute nodes using `mpiexec` to start an MPI application. Documentation is accessible via `man mpiexec` and some helpful options follow.
Expand Down

0 comments on commit 731c28d

Please sign in to comment.