Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PCR Thomas Laplacian inversion solver #2340

Merged
merged 12 commits into from
Jul 28, 2022
Merged

Add PCR Thomas Laplacian inversion solver #2340

merged 12 commits into from
Jul 28, 2022

Conversation

JosephThomasParker
Copy link
Contributor

This adds another direct parallel solver, pcr_thomas, based on Laszlo, Gilles and Appleyard, Manycore Algorithms for Batch Scalar and Block Tridiagonal Solvers, ACM TOMS, 42, 31 (2016) and adapted from the parallel_tdma_cpp library.

It is very similar to the parallel cyclic reduction solver (#2330), except instead of performing PCR directly on the distributed tridiagonal matrix, it uses the Thomas algorithm on each core to get a reduced tridiagonal system with 2 equations per processor. It then calls PCR on this reduced system. This reduces the number of communications from O(log2(nx)) to O(log2(nxpe)), without changing the O(nx) complexity.

This solver passes the Laplace integrated tests, but I haven't done any performance analysis on it. This is really intended as a stepping stone to getting a node-aware solver, which would further reduce the communication count to O(log2(nnodes)) inter-node communications. To do this, we need to replace the Thomas solver with a parallel direct solver and work out how to extract the coefficients to feed into a node-level PCR. This PR gives the code structure to do this, but since it also adds a complete solver, I thought I'd add it as an option.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

}

if (dst) {
BOUT_OMP(parallel) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: if with identical then and else branches [bugprone-branch-clone]

if (dst) {
  ^
src/invert/laplace/impls/pcr_thomas/pcr_thomas.cxx:230:5: note: else branch starts here
  } else {
    ^

src/invert/laplace/impls/pcr_thomas/pcr_thomas.cxx Outdated Show resolved Hide resolved
src/invert/laplace/impls/pcr_thomas/pcr_thomas.hxx Outdated Show resolved Hide resolved
Comment on lines 192 to 194
BoutReal zlen = coords->dz * (localmesh->LocalNz - 3);
BoutReal kwave =
kz * 2.0 * PI / (2. * zlen); // wave number is 1/[rad]; DST has extra 2.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to what I asked for the pcr - why is there a 3 but the comment says 2?

Besides that, would that also work if dz is a Field2D?

I assume variation of dz along z wouldn't work due to FFTs require being uniform ...

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

src/invert/laplace/impls/pcr_thomas/pcr_thomas.cxx Outdated Show resolved Hide resolved
src/invert/laplace/impls/pcr_thomas/pcr_thomas.cxx Outdated Show resolved Hide resolved
src/invert/laplace/impls/pcr_thomas/pcr_thomas.cxx Outdated Show resolved Hide resolved
src/invert/laplace/impls/pcr_thomas/pcr_thomas.cxx Outdated Show resolved Hide resolved
src/invert/laplace/impls/pcr_thomas/pcr_thomas.cxx Outdated Show resolved Hide resolved
src/invert/laplace/impls/pcr_thomas/pcr_thomas.cxx Outdated Show resolved Hide resolved
src/invert/laplace/impls/pcr_thomas/pcr_thomas.cxx Outdated Show resolved Hide resolved
src/invert/laplace/impls/pcr_thomas/pcr_thomas.cxx Outdated Show resolved Hide resolved
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

src/invert/laplace/impls/pcr_thomas/pcr_thomas.cxx Outdated Show resolved Hide resolved
src/invert/laplace/impls/pcr_thomas/pcr_thomas.cxx Outdated Show resolved Hide resolved
@ZedThree
Copy link
Member

This would work best when transformed into a node-aware solver. The best way to do that might be running one MPI rank per node, and then filling up the node with OpenMP threads. So, we might want to wait for the outer-loop stuff in order to get the best performance.

@ZedThree ZedThree added this to the BOUT-5.0 milestone Jul 27, 2022
ZedThree added 6 commits July 28, 2022 15:44
* next: (1519 commits)
  Remove some useless forward declarations
  Revert "Remove some unused headers from vecops"
  Remove some unused headers from vecops
  Fix bad merge: missed removing extraneous `const` on return value
  Remove `Up/DownXSplitIndex`
  Remove last remaining uses of `deprecated.hxx`
  Apply clang-format
  Apply clang-format changes
  CI: Don't update submodules (already updated)
  CI: Workaround for git ownership checks in container
  CI: Bump actions/checkout version
  CI: Use latest version of cmake in clang-tidy-review
  Use MPI_COMM_NULL rather than nullptr
  Add 'u' format flag to BOUT.settings output
  Adds a 'u' format, commenting unused options
  Apply clang-format changes
  Apply clang-format changes
  Don't compile LaplaceXY2 with 3D metrics
  Apply clang-format changes
  Apply clang-format changes
  ...
- implicit narrowing conversion
- C-style array
- loop variable name too short
- multiple declarations on same line
- dead stores
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@ZedThree ZedThree changed the title PCR Thomas Add PCR Thomas Laplacian inversion solver Jul 28, 2022
@ZedThree ZedThree merged commit 8c6968b into next Jul 28, 2022
@ZedThree ZedThree deleted the pcr_thomas branch July 28, 2022 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants