-
Notifications
You must be signed in to change notification settings - Fork 570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DO NOT MERGE] Tpetra: Initial support for execution_space instances in MultiVector #13160
base: develop
Are you sure you want to change the base?
Conversation
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
Status Flag 'Pre-Test Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED by label AT: PRE-TEST INSPECTED! Autotester is Removing Label; this inspection will remain valid until a new commit to source branch is performed. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_PR_gcc-8.3.0
Jenkins Parameters
Build InformationTest Name: PR_gcc
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-8.3.0-debug
Jenkins Parameters
Build InformationTest Name: PR_clang
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_python3
Jenkins Parameters
Build InformationTest Name: PR_cuda
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_intel-2021.3
Jenkins Parameters
Build InformationTest Name: PR_cuda-uvm
Jenkins Parameters
Build InformationTest Name: PR_gcc-openmpi_debug
Jenkins Parameters
Build InformationTest Name: PR_gcc-openmpi-openmp
Jenkins Parameters
Using Repos:
Pull Request Author: csiefer2 |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: Trilinos_PR_gcc-8.3.0
Jenkins Parameters
Build InformationTest Name: PR_gcc
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-8.3.0-debug
Jenkins Parameters
Build InformationTest Name: PR_clang
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_python3
Jenkins Parameters
Build InformationTest Name: PR_cuda
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_intel-2021.3
Jenkins Parameters
Build InformationTest Name: PR_cuda-uvm
Jenkins Parameters
Build InformationTest Name: PR_gcc-openmpi_debug
Jenkins Parameters
Build InformationTest Name: PR_gcc-openmpi-openmp
Jenkins Parameters
|
@romintomasetti Well the only test that failed was a randomly occurring one, so this looks like it worked. Next up: Reworking the rest of the relevant routines on MultiVector to take exec space instances. |
Status Flag 'Pull Request AutoTester' - User Requested Retest - Label AT: RETEST will be reset after testing. |
Status Flag 'Pre-Test Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED by label AT: PRE-TEST INSPECTED! Autotester is Removing Label; this inspection will remain valid until a new commit to source branch is performed. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_PR_gcc-8.3.0
Jenkins Parameters
Build InformationTest Name: PR_gcc
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-8.3.0-debug
Jenkins Parameters
Build InformationTest Name: PR_clang
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_python3
Jenkins Parameters
Build InformationTest Name: PR_cuda
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_intel-2021.3
Jenkins Parameters
Build InformationTest Name: PR_cuda-uvm
Jenkins Parameters
Build InformationTest Name: PR_gcc-openmpi_debug
Jenkins Parameters
Build InformationTest Name: PR_gcc-openmpi-openmp
Jenkins Parameters
Using Repos:
Pull Request Author: csiefer2 |
@csiefer2 That's good news ! Do you think we need to rework the rest of the |
enableIfNonConstData<DualViewType> | ||
sync_host(const typename DualViewType::t_host::execution_space& exec, DualViewType dualView) { | ||
// This will sync, but only if needed | ||
dualView.sync_host(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dualView.sync_host(); | |
dualView.sync_host(exec); |
template <typename DualViewType> | ||
enableIfNonConstData<DualViewType> | ||
sync_host(DualViewType dualView) { | ||
// This will sync, but only if needed | ||
dualView.sync_host(); | ||
} | ||
|
||
template <typename DualViewType> | ||
enableIfNonConstData<DualViewType> | ||
sync_host(const typename DualViewType::t_host::execution_space& exec, DualViewType dualView) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sync_host(const typename DualViewType::t_host::execution_space& exec, DualViewType dualView) { | |
sync_host(const typename DualViewType::t_host::execution_space& exec, DualViewType& dualView) { |
(Not sure why you want to take by value in this case...)
template <typename DualViewType> | ||
enableIfConstData<DualViewType> | ||
sync_host(DualViewType dualView) { } | ||
|
||
template <typename DualViewType> | ||
enableIfConstData<DualViewType> | ||
sync_host(const typename DualViewType::t_host::execution_space& exec, DualViewType dualView) { } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to https://github.com/kokkos/kokkos/blob/2d7715239700f50169bc50a96a234b05c28c9a2e/containers/src/Kokkos_DualView.hpp#L721-L736, it seems that the Kokkos::DualView
throws when calling sync_<host/device>
. Is it intended that you change this behavior by making it a no-op ?
void | ||
MultiVector<Scalar, LocalOrdinal, GlobalOrdinal, Node>:: | ||
scale (const execution_space& exec, const Scalar& alpha) | ||
{ | ||
using Kokkos::ALL; | ||
using IST = impl_scalar_type; | ||
|
||
const IST theAlpha = static_cast<IST> (alpha); | ||
if (theAlpha == Kokkos::ArithTraits<IST>::one ()) { | ||
return; // do nothing | ||
} | ||
const size_t lclNumRows = getLocalLength (); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would you make these copy-pastes ? I think we should really put some effort in designing this carefully so we avoid code duplication, which is highway to hell 😈
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I concur 100%. This was a "test if it works" sort of thing. I suspect we can reduce the code using template arguments and if constexpr. Will try that next.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@romintomasetti How about something like this?
https://gcc.godbolt.org/z/xdfrE966q
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest... I don't really like it much. It would make the code hard to read, and a pain to maintain.
In Kokkos
, what they usually do (@masterleinad Could you confirm this is still the way to go ?) is that the overload that does not take an execution space instance first globally fences, then uses the other overload with the default execution space instance, and then fence. Like this:
template <typename ViewType>
void my_func(const ViewType& data)
{
Kokkos::fence(...);
typename ViewType::execution_space exec;
my_func(exec, data);
exec.fence(...);
}
template <typename Exec, typename ViewType>
void my_func(const Exec& exec, const ViewType& data)
{
...
}
I think this is a good design:
- the implementation is still easy to read, and to maintain because it will always work with an execution space instance (no need for
constexpr if
statements that kill the readability) - for a user that does not care much about asynchronicity, I think it's no big deal to have a global fence from time to time.
Here is an example: https://github.com/kokkos/kokkos/blob/33c9b8cef70eddabc3c1f07d053be23abf81e912/algorithms/src/Kokkos_Random.hpp#L1554-L1563.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@romintomasetti The default execution space instance is not the same as no instance.
The first triggers fences on that one space. The latter triggers fences on the entire device.
Different behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@csiefer2 I'll be out of office for 2 weeks. Would you like to discuss it in a dedicated meeting somewhere around the 25th of July?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@romintomasetti Absolutely. Drop me an email to schedule
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did this get sorted out? Happy to discuss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this went out of my head, but we're definitely interested in discussing this !
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: Trilinos_PR_gcc-8.3.0
Jenkins Parameters
Build InformationTest Name: PR_gcc
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-8.3.0-debug
Jenkins Parameters
Build InformationTest Name: PR_clang
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_python3
Jenkins Parameters
Build InformationTest Name: PR_cuda
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_intel-2021.3
Jenkins Parameters
Build InformationTest Name: PR_cuda-uvm
Jenkins Parameters
Build InformationTest Name: PR_gcc-openmpi_debug
Jenkins Parameters
Build InformationTest Name: PR_gcc-openmpi-openmp
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
Co-authored-by: Tomasetti Romin <[email protected]>
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
As discussed with @cwpearson , @romintomasetti and @maartenarnst
Issue: #12463
Tests: We need to add those before we merge
Stakeholder Feedback: Discussions at EuroTUG '24, Tpetra Team meeting 6/25/24
We don't want to merge this yet. We want to see what the AT says first. We also probably want to add a execution_space instance support to a few more functions.