Skip to content

Commit

Permalink
Periodic boundaries (#60)
Browse files Browse the repository at this point in the history
* Pass args for periodicity, define private members, provide public getters
* Pass px and py through main
* Pass px and py through ZoltanPartitioner.partition
* Fix misc typos
* Discover periodic neighbours; update integration test KGOs
* Subdomains can be periodic neighbours of themselves
* Fix filenames
* Clearer wording for get_neighbours
* Put integration tests in loop
* Raise error if invalid order
* Add parallel versions of test_1; colour pass message
* Enable associative arrays on Mac by updating Bash
  • Loading branch information
jwallwork23 authored Nov 5, 2024
1 parent 1fc054b commit ee789c9
Show file tree
Hide file tree
Showing 17 changed files with 617 additions and 42 deletions.
1 change: 1 addition & 0 deletions .github/workflows/actions.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ jobs:
- uses: actions/checkout@v3
- name: Install dependencies
run: |
brew install bash
brew install cmake
brew install boost
brew install hdf5-mpi
Expand Down
20 changes: 14 additions & 6 deletions Grid.cpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
/*!
* @file Grid.cpp
* @date 23 Aug 2024
* @date 05 Nov 2024
* @author Athena Elafrou <[email protected]>
*/

Expand Down Expand Up @@ -34,16 +34,18 @@ static std::vector<int> find_factors(const int n)
}
}

Grid* Grid::create(MPI_Comm comm, const std::string& filename, bool ignore_mask)
Grid* Grid::create(MPI_Comm comm, const std::string& filename, bool ignore_mask, bool px, bool py)
{
return new Grid(comm, filename, "x", "y", std::vector<int>({ 1, 0 }), "mask", ignore_mask);
return new Grid(
comm, filename, "x", "y", std::vector<int>({ 1, 0 }), "mask", ignore_mask, px, py);
}

Grid* Grid::create(MPI_Comm comm, const std::string& filename, const std::string xdim_name,
const std::string ydim_name, const std::vector<int> dim_order, const std::string mask_name,
bool ignore_mask)
bool ignore_mask, bool px, bool py)
{
return new Grid(comm, filename, xdim_name, ydim_name, dim_order, mask_name, ignore_mask);
return new Grid(
comm, filename, xdim_name, ydim_name, dim_order, mask_name, ignore_mask, px, py);
}

void Grid::ReadGridExtents(const std::string& filename)
Expand Down Expand Up @@ -129,10 +131,12 @@ void Grid::ReadGridMask(const std::string& filename, const std::string& mask_nam

Grid::Grid(MPI_Comm comm, const std::string& filename, const std::string& xdim_name,
const std::string& ydim_name, const std::vector<int>& dim_order, const std::string& mask_name,
bool ignore_mask)
bool ignore_mask, bool px, bool py)
: _comm(comm)
, _dim_names({ xdim_name, ydim_name })
, _dim_order(dim_order)
, _px(px)
, _py(py)
{

CHECK_MPI(MPI_Comm_rank(comm, &_rank));
Expand Down Expand Up @@ -197,6 +201,10 @@ int Grid::get_num_objects() const { return _num_objects; }

int Grid::get_num_nonzero_objects() const { return _num_nonzero_objects; }

bool Grid::get_px() const { return _px; }

bool Grid::get_py() const { return _py; }

std::vector<int> Grid::get_num_procs() const { return _num_procs; }

std::vector<int> Grid::get_global_ext() const { return _global_ext; }
Expand Down
32 changes: 26 additions & 6 deletions Grid.hpp
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
/*!
* @file Grid.hpp
* @author Athena Elafrou <[email protected]>
* @date 23 Aug 2024
* @date 05 Nov 2024
*/

#pragma once
Expand Down Expand Up @@ -67,18 +67,23 @@ class LIB_EXPORT Grid {
*
* @param comm MPI communicator.
* @param filename Grid file in netCDF format.
* @param dim0_name Name of 1st grid dimension in netCDF file (optional)
* @param dim1_name Name of 2nd grid dimension in netCDF file (optional)
* @param xdim_name Name of 1st grid dimension in netCDF file (optional)
* @param ydim_name Name of 2nd grid dimension in netCDF file (optional)
* @param dim_order Permutation for ordering of dimensions.
* @param mask_name Name of land mask variable in netCDF file (optional)
* @param ignore_mask Should the land mask be ignored
* @param px Is the domain periodic in the x-direction
* @param py Is the domain periodic in the y-direction
* @return A distributed 2D grid object partitioned evenly in terms of grid
* points.
*/
// We are using the named constructor idiom so that objects can only be
// created in the heap to ensure it's dtor is executed before MPI_Finalize()
static Grid* create(MPI_Comm comm, const std::string& filename, bool ignore_mask = false);
static Grid* create(MPI_Comm comm, const std::string& filename, bool ignore_mask = false,
bool px = false, bool py = false);
static Grid* create(MPI_Comm comm, const std::string& filename, const std::string xdim_name,
const std::string ydim_name, const std::vector<int> dim_order, const std::string mask_name,
bool ignore_mask = false);
bool ignore_mask = false, bool px = false, bool py = false);

/*!
* @brief Returns the total number of objects in the local domain.
Expand Down Expand Up @@ -131,6 +136,18 @@ class LIB_EXPORT Grid {
*/
const int* get_land_mask() const;

/*!
* @brief Returns `true` if the grid is periodic in the x-direction, otherwise `false`.
* @return Periodicity in x-direction
*/
bool get_px() const;

/*!
* @brief Returns `true` if the grid is periodic in the y-direction, otherwise `false`.
* @return Periodicity in y-direction
*/
bool get_py() const;

/*!
* @brief Returns the index mapping of sparse to dense representation, where
* dim0 is the 1st dimension and dim1 is the 2nd, with dim1 varying the
Expand Down Expand Up @@ -164,7 +181,8 @@ class LIB_EXPORT Grid {
Grid(MPI_Comm comm, const std::string& filename, const std::string& dim0_id = "x",
const std::string& dim1_id = "y",
const std::vector<int>& dim_order = std::vector<int>({ 1, 0 }),
const std::string& mask_id = "mask", bool ignore_mask = false);
const std::string& mask_id = "mask", bool ignore_mask = false, bool px = false,
bool py = false);

/*!
* @brief Read dims from netcdf grid file.
Expand Down Expand Up @@ -215,6 +233,8 @@ class LIB_EXPORT Grid {

int _num_objects = 0; // Number of grid points ignoring land mask
int _num_nonzero_objects = 0; // Number of non-land grid points
bool _px = false; // Periodicity in the x-direction
bool _py = false; // Periodicity in the y-direction
std::vector<int> _land_mask = {}; // Land mask values
std::vector<int> _local_id = {}; // Map from sparse to dense index
std::vector<int> _global_id = {}; // Unique non-land grid point IDs
Expand Down
106 changes: 104 additions & 2 deletions Partitioner.cpp
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
/*!
* @file Partitioner.cpp
* @author Athena Elafrou <[email protected]>
* @date 12 Sep 2024
* @date 05 Nov 2024
*/

#include "Partitioner.hpp"
Expand Down Expand Up @@ -42,6 +42,23 @@ void Partitioner::get_neighbours(
}
}

void Partitioner::get_neighbours_periodic(
std::vector<std::vector<int>>& ids_p, std::vector<std::vector<int>>& halo_sizes_p) const
{
// NOTE: Neighbours are looped in the order left-right-bottom-top
for (int d = 0; d < NDIMS; d++) {
if ((d == 0 && _px) || (d == 1 && _py)) {
for (int i = 0; i < 2; i++) {
int idx = 2 * d + i;
for (auto it = _neighbours_p[idx].begin(); it != _neighbours_p[idx].end(); ++it) {
ids_p[idx].push_back(it->first);
halo_sizes_p[idx].push_back(it->second);
}
}
}
}
}

void Partitioner::save_mask(const std::string& filename) const
{
// Use C API for parallel I/O
Expand Down Expand Up @@ -113,6 +130,16 @@ void Partitioner::save_metadata(const std::string& filename) const
CHECK_MPI(MPI_Exscan(&num_neighbours[idx], &offsets[idx], 1, MPI_INT, MPI_SUM, _comm));
}

// Prepare periodic neighbour data
std::vector<std::vector<int>> ids_p(NNBRS), halos_p(NNBRS);
get_neighbours_periodic(ids_p, halos_p);
std::vector<int> num_neighbours_p(NNBRS), dims_p(NNBRS, 0), offsets_p(NNBRS, 0);
for (int idx = 0; idx < NNBRS; idx++) {
num_neighbours_p[idx] = (int)ids_p[idx].size();
CHECK_MPI(MPI_Allreduce(&num_neighbours_p[idx], &dims_p[idx], 1, MPI_INT, MPI_SUM, _comm));
CHECK_MPI(MPI_Exscan(&num_neighbours_p[idx], &offsets_p[idx], 1, MPI_INT, MPI_SUM, _comm));
}

// Define dimensions in netCDF file
int dimid;
std::vector<int> dimids(NNBRS);
Expand All @@ -121,6 +148,13 @@ void Partitioner::save_metadata(const std::string& filename) const
NC_CHECK(nc_def_dim(nc_id, dir_chars[idx].c_str(), dims[idx], &dimids[idx]));
}

// Define periodic dimensions in netCDF file
std::vector<int> dimids_p(NNBRS);
for (int idx = 0; idx < NNBRS; idx++) {
NC_CHECK(
nc_def_dim(nc_id, (dir_chars[idx] + "_periodic").c_str(), dims_p[idx], &dimids_p[idx]));
}

// Define groups in netCDF file
int bbox_gid, connectivity_gid;
NC_CHECK(nc_def_grp(nc_id, "bounding_boxes", &bbox_gid));
Expand Down Expand Up @@ -148,6 +182,19 @@ void Partitioner::save_metadata(const std::string& filename) const
NC_CHECK(nc_def_var(connectivity_gid, (dir_names[idx] + "_neighbour_halos").c_str(), NC_INT,
1, &dimids[idx], &halos_vid[idx]));
}
int num_vid_p[NNBRS];
int ids_vid_p[NNBRS];
int halos_vid_p[NNBRS];
for (int idx = 0; idx < NNBRS; idx++) {
// Periodic members of connectivity group
NC_CHECK(nc_def_var(connectivity_gid, (dir_names[idx] + "_neighbours_periodic").c_str(),
NC_INT, 1, &dimid, &num_vid_p[idx]));
NC_CHECK(nc_def_var(connectivity_gid, (dir_names[idx] + "_neighbour_ids_periodic").c_str(),
NC_INT, 1, &dimids_p[idx], &ids_vid_p[idx]));
NC_CHECK(
nc_def_var(connectivity_gid, (dir_names[idx] + "_neighbour_halos_periodic").c_str(),
NC_INT, 1, &dimids_p[idx], &halos_vid_p[idx]));
}

// Write metadata to file
NC_CHECK(nc_enddef(nc_id));
Expand All @@ -161,16 +208,30 @@ void Partitioner::save_metadata(const std::string& filename) const
NC_CHECK(nc_put_var1_int(bbox_gid, cnt_vid[idx], &start, &_local_ext_new[idx]));
}
for (int idx = 0; idx < NNBRS; idx++) {
// Numbers of neighbours
size_t start = _rank;
NC_CHECK(nc_var_par_access(connectivity_gid, num_vid[idx], NC_COLLECTIVE));
NC_CHECK(nc_put_var1_int(connectivity_gid, num_vid[idx], &start, &num_neighbours[idx]));
// Numbers of neighbours for periodic dimensions
NC_CHECK(nc_var_par_access(connectivity_gid, num_vid_p[idx], NC_COLLECTIVE));
NC_CHECK(nc_put_var1_int(connectivity_gid, num_vid_p[idx], &start, &num_neighbours_p[idx]));
// IDs and halos
start = offsets[idx];
size_t count = num_neighbours[idx];
NC_CHECK(nc_var_par_access(connectivity_gid, ids_vid[idx], NC_COLLECTIVE));
NC_CHECK(nc_put_vara_int(connectivity_gid, ids_vid[idx], &start, &count, ids[idx].data()));
NC_CHECK(nc_var_par_access(connectivity_gid, halos_vid[idx], NC_COLLECTIVE));
NC_CHECK(
nc_put_vara_int(connectivity_gid, halos_vid[idx], &start, &count, halos[idx].data()));
// IDs and halos for periodic dimensions
start = offsets_p[idx];
count = num_neighbours_p[idx];
NC_CHECK(nc_var_par_access(connectivity_gid, ids_vid_p[idx], NC_COLLECTIVE));
NC_CHECK(
nc_put_vara_int(connectivity_gid, ids_vid_p[idx], &start, &count, ids_p[idx].data()));
NC_CHECK(nc_var_par_access(connectivity_gid, halos_vid_p[idx], NC_COLLECTIVE));
NC_CHECK(nc_put_vara_int(
connectivity_gid, halos_vid_p[idx], &start, &count, halos_p[idx].data()));
}
NC_CHECK(nc_close(nc_id));
}
Expand Down Expand Up @@ -260,13 +321,16 @@ void Partitioner::discover_neighbours()
}

for (int p = 0; p < _total_num_procs; p++) {

// When finding neighbours *within* the domain, we don't check against the current rank
// because a subdomain can't be a neighbour of itself.
if (p != _rank) {

// Find my left neighbours
// i.e., the left edge of domain[_rank] has to match the right edge of domain[p]
if (domains[_rank].p1.x == domains[p].p2.x) {
// next compute overlap (if overlap is zero then they are not neighbours)
// domain overlap is equivalent to the amount of data transfered in halo exchnage.
// domain overlap is equivalent to the amount of data transfered in halo exchange.
int halo_size = domainOverlap(domains[_rank], domains[p], 'y');
if (halo_size) {
_neighbours[0].insert(std::pair<int, int>(p, halo_size));
Expand Down Expand Up @@ -297,5 +361,43 @@ void Partitioner::discover_neighbours()
}
}
}

// When finding neighours *across periodic boundaries*, we need to check against the current
// rank, too, because a subdomain can be a periodic neighbour of itself.
if (_px) {
// Find my left periodic neighbours
if ((domains[_rank].p1.x == 0) && (domains[p].p2.x == _global_ext[0])) {
int halo_size = domainOverlap(domains[_rank], domains[p], 'y');
if (halo_size) {
_neighbours_p[0].insert(std::pair<int, int>(p, halo_size));
}
}

// Find my right periodic neighbours
if ((domains[_rank].p2.x == _global_ext[0]) && (domains[p].p1.x == 0)) {
int halo_size = domainOverlap(domains[_rank], domains[p], 'y');
if (halo_size) {
_neighbours_p[1].insert(std::pair<int, int>(p, halo_size));
}
}
}

if (_py) {
// Find my bottom periodic neighbours
if ((domains[_rank].p1.y == 0) && (domains[p].p2.y == _global_ext[1])) {
int halo_size = domainOverlap(domains[_rank], domains[p], 'x');
if (halo_size) {
_neighbours_p[2].insert(std::pair<int, int>(p, halo_size));
}
}

// Find my top periodic neighbours
if ((domains[_rank].p2.y == _global_ext[1]) && (domains[p].p1.y == 0)) {
int halo_size = domainOverlap(domains[_rank], domains[p], 'x');
if (halo_size) {
_neighbours_p[3].insert(std::pair<int, int>(p, halo_size));
}
}
}
}
}
25 changes: 21 additions & 4 deletions Partitioner.hpp
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
/*!
* @file Partitioner.hpp
* @author Athena Elafrou <[email protected]>
* @date 11 Sep 2024
* @date 05 Nov 2024
*/

#pragma once
Expand Down Expand Up @@ -56,15 +56,27 @@ class LIB_EXPORT Partitioner {
void get_bounding_box(int& global_0, int& global_1, int& local_ext_0, int& local_ext_1) const;

/*!
* @brief Returns vectors containing the MPI ranks and halo sizes of the neighbours for this
* process after partitioning. The neighbours are ordered left, right, bottom, top.
* @brief Returns vectors containing the MPI ranks and halo sizes of the neighbours of this
* process in the domain interior after partitioning. The neighbours are ordered left, right,
* bottom, top.
*
* @param ids MPI ranks of the neighbours for each direction
* @param halo_sizes Halo sizes of the neighbours for each direction
*/
void get_neighbours(
std::vector<std::vector<int>>& ids, std::vector<std::vector<int>>& halo_sizes) const;

/*!
* @brief Returns vectors containing the MPI ranks and halo sizes of the neighbours of this
* process across periodic boundaries after partitioning. The neighbours are ordered left,
* right, bottom, top.
*
* @param ids MPI ranks of the neighbours for each direction
* @param halo_sizes Halo sizes of the neighbours for each direction
*/
void get_neighbours_periodic(
std::vector<std::vector<int>>& ids, std::vector<std::vector<int>>& halo_sizes) const;

/*!
* @brief Saves the partition IDs of the latest 2D domain decomposition in a
* NetCDF file.
Expand Down Expand Up @@ -103,7 +115,7 @@ class LIB_EXPORT Partitioner {
// created in the heap to ensure it's dtor is executed before MPI_Finalize()
Partitioner(MPI_Comm comm);

// Discover the processe's neighbours and halo sizes after partitioning
// Discover the neighbours and halo sizes of the process after partitioning
void discover_neighbours();

protected:
Expand All @@ -112,6 +124,8 @@ class LIB_EXPORT Partitioner {
int _total_num_procs = -1; // Total number of processes in communicator
static const int NDIMS = 2; // Number of dimensions
static const int NNBRS = 2 * NDIMS; // Number of neighbours (two per dimension)
bool _px = false; // Periodic boundary in the x-direction
bool _py = false; // Periodic boundary in the y-direction

// Letters used for each dimension
std::vector<std::string> dim_chars = { "x", "y" };
Expand Down Expand Up @@ -149,6 +163,9 @@ class LIB_EXPORT Partitioner {
// Vector of maps of neighbours to their halo sizes after partitioning
std::vector<std::map<int, int>> _neighbours = std::vector<std::map<int, int>>(NNBRS);

// Vector of maps of periodic neighbours to their halo sizes after partitioning
std::vector<std::map<int, int>> _neighbours_p = std::vector<std::map<int, int>>(NNBRS);

public:
struct LIB_EXPORT Factory {
/*!
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,12 @@ A [dedicated clang format file](https://github.com/nextsimhub/nextsimdg/blob/mai

## Problem Statement

We address the problem of domain decomposition of numerical ocean and sea-ice models. Such models typically use a sea-land mask to omit unnecessary computations on land. A typical approach is to use a cartesian (or rectilinear or general block distribution) domain decomposition among processors, where the domain is divided in equally sized sub-domains, ignoring the sea-land mask. This, however, may lead to significant load imbalance which impedes scalability.
We address the problem of domain decomposition of numerical ocean and sea-ice models. Such models typically use a sea-land mask to omit unnecessary computations on land. A typical approach is to use a Cartesian (or rectilinear or general block distribution) domain decomposition among processors, where the domain is divided in equally sized sub-domains, ignoring the sea-land mask. This, however, may lead to significant load imbalance which impedes scalability.

We have identified the following requirements for the domain decomposition algorithm:
- produce balanced sub-domains in terms of work and communication
- produce rectangular sub-domains as this will help in handling communication during halo exchanges between neighbouring processes
- be static (computed either offline or during the initialiasation phase)
- be static (computed either offline or during the initialisation phase)
- be scalable

The proposed approach is based on the Recursive Coordinate Bisection (RCB) geometric partitioning algorithm [^1]. Geometric coordinates are first partitioned into two balanced parts. Partitioning continues recursively in each part until the desired number of balanced parts has been created. The algorithm can be tuned to build rectilinear partitions. We are using the implementation of the RCB algorithm available in the [Zoltan](https://sandialabs.github.io/Zoltan/) library developed by Sandia National Laboratories.
Expand Down
Loading

0 comments on commit ee789c9

Please sign in to comment.