-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/kokkos #188
base: develop
Are you sure you want to change the base?
Feature/kokkos #188
Conversation
Conflicts: CMakeLists.txt src/basic/CMakeLists.txt
Conflicts: .gitmodules src/common/Executor.cpp src/common/RAJAPerfSuite.cpp src/common/RunParams.cpp tpl/RAJA
Hi @artv3 -- Thanks for the question. I started working on the project late last year (first commit was Nov. 16, 2020), so it probably is an "older" version of the Perf Suite. I did not work on anything called "MAT_MAT_SHARED." Instead, I provided Kokkos translations of RAJAPerf Suite kernels (that existed around that date). Does that make sense? |
@davidbeckingsale , @DavidPoliakoff -- I'm also removing or suppressing this debugging output in this commit: Running specified kernels and variants... |
Yes, do you plan to add Kokkos kernels for newer kernels in RAJAPerf? |
Morning @artv3 -- Thanks again for your question, and your interest. @DavidPoliakoff, @davidbeckingsale and I have briefly discussed providing Kokkos translations for the new kernels , but that area of work has not been discussed yet with our project leads / folks who control the purse strings. Do you have a particular interest in Kokkos versions of the new kernels? If so, then it might help me strengthen the argument for doing the Kokkos translations of the new kernels. Does it benefit both groups to understand performance behavior over time? |
Yes absolutely! Those kernels express hierarchical parallelism and for our applications we have found that by taking advantage of the memory hierachy on the GPU (using shared memory [CUDA/HIP] or device local memory [SYCL]) we can really improve kernel performance. Recognizing that I always wonder how do you expose these device feature features in a portable and friendly way across programing models and of course minimize abstraction layerover head. As new programming models come online how do we maintain these features? |
Fixed timing code
list(APPEND RAJA_PERFSUITE_DEPENDS hip) | ||
endif() | ||
|
||
set(RAJAPERF_BUILD_SYSTYPE $ENV{SYS_TYPE}) | ||
set(RAJAPERF_BUILD_HOST $ENV{HOSTNAME}) | ||
|
||
if (ENABLE_CUDA) | ||
set(CMAKE_CUDA_STANDARD 11) | ||
if (ENABLE_CUDA AND ENABLE_KOKKOS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ENABLE_CUDA here is redundant (since we are already inside an if (ENABLE_CUDA)
block.
#set_target_properties(kokkoscore kokkoscontainers PROPERTIES LANGUAGE CUDA) | ||
endif() | ||
get_property(KOKKOS_INCLUDE_DIRS DIRECTORY tpl/kokkos PROPERTY INCLUDE_DIRECTORIES) | ||
include_directories(${KOKKOS_INCLUDE_DIRS}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this required? I think that having kokkos
in the RAJA_PERFSUITE_DEPENDS should be enough.
src/CMakeLists.txt
Outdated
lcals-kokkos/INT_PREDICT-Kokkos.cpp | ||
lcals-kokkos/PLANCKIAN-Kokkos.cpp | ||
lcals-kokkos/TRIDIAG_ELIM-Kokkos.cpp | ||
#polybench/POLYBENCH_2MM.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these all commented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Polybench kernels are commented because we do not yet have Kokkos translations for them. I believe the situation calls for additional logic to deal with the relative immaturity of the Kokkos design. Does this make sense? It's a choice (commenting) David P and I made in the course of prototyping.
src/RAJAPerfSuiteDriver.cpp
Outdated
//rajaperf::RunParams params(argc, argv); | ||
//executor.registerGroup("Sparse"); | ||
|
||
//executor.registerKernel("Sparse", rajaperf::make_kernel_base( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted all debugging code.
src/apps/WIP-COUPLE.cpp
Outdated
@@ -20,184 +20,175 @@ namespace rajaperf | |||
namespace apps | |||
{ | |||
|
|||
// | |||
//COUPLE::COUPLE(const RunParams& params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots more commented code?
src/CMakeLists.txt
Outdated
add_subdirectory(apps) | ||
add_subdirectory(apps-kokkos) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these should be guarded with ENABLE_KOKKOS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these should be guarded with ENABLE_KOKKOS
Hi David, if I am understanding your suggestion, guarding all things Kokkos in this file will require significant reorganization of this file. Is that what you would like for me to do, i.e., cleanly separate RAJAPerf Suite content from Kokkos content?
add_subdirectory(common) | ||
|
||
if(NOT INFRASTRUCTURE_ONLY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does the INFRASTRUCTRUE_ONLY option come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does the INFRASTRUCTRUE_ONLY option come from?
Hi David -- Taking a step back to explain the "big picture", remember that we are using RAJAPerf Suite (RPS) in two distinct manners: 1) Provide Kokkos translations of RPS kernels that can be run along side native RPS kernels, and 2) Using RPS as the infrastructure driver for Kokkos Kernels and Kokkos performance tests that we have written. To be clear, this second case has nothing to do with RPS kernels & their Kokkos translations. This second case is tied to the "INFRASTRUCTURE_ONLY" option. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The why makes sense, I meant that INFRASTRUCTURE_ONLY should be explicitly defined as a CMake option
(https://cmake.org/cmake/help/latest/command/option.html) with a default value of False, that you can override when you are using the perfsuite as the infrastructure driver.
src/CMakeLists.txt
Outdated
DEPENDS_ON ${RAJA_PERFSUITE_DEPENDS} | ||
) | ||
|
||
else() | ||
if(NOT INFRASTRUCTURE_ONLY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi David -- Please see my comment above on this point (i.e., there are two distinct scenarios in which we're using RAJAPerf Suite). Would it be possible for us to meet, and work through at least some / most of your suggested changes? I just want to make sure I'm doing what you want me to do for the merge.
@@ -158,15 +165,18 @@ void Executor::setupSuite() | |||
run_params.setInvalidExcludeFeatureInput(invalid); | |||
|
|||
// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: @rhornung67 , @davidbeckingsale , there is functionality we have not yet implemented in Kokkos, but that is important for RAJAPerf Suite. Starting at line 167, and ending at ~ line 342 in Executor.cpp, we have commented this functionality. Rich, David B. suggested that I work with you on how to handle integrating this (currently) commented functionality with Kokkos design (i.e., dynamic addition of new kernels and kernel groups).
str << std::endl; | ||
} // loop over features | ||
str.flush(); | ||
} | ||
|
||
// TODO for Kokkos Team: Commenting function body, because this infrastructure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: @rhornung67 , @davidbeckingsale - commented lines here, 654-674, also represent important functionality for RAJAPerf Suite that has not yet been implemented for Kokkos. Would you guys please help me work through integrating these bits?
Hey guys, I didn't yet read all the comments but I think we should try and split the infrastructure changes from the Kokkos kernel variant changes. So essentially something like this: @davidbeckingsale what do you think about this? Note that PR2 are not really dependent on PR1 so we could also do those two first and then tackle the infrastructure change. We could also split PR2 into doing kernel groups individually I guess? |
I agree that splitting the changes up into two distinct pieces would be best, and I suggested that approach before. I think the two main sticking points on this PR right now are 1) ensuring that the infrastructure changes maintain all the current capability, and 2) deciding how to handle cases where a variant is "missing". |
Hi David & Christian -- Many thanks for responding. As for sticking point 2, in the most recent push, I did stub in Kokkos variants for the polybench kernels, somewhat alleviating that issue. But we are still left with sticking point 1. I will work with Christian on this, and see if we can come up with a satisfactory solution that preserves current infrastructure capability. |
Summary
@davidbeckingsale and @rhornung67 have been our main points of contact and collaboration in this work
The proposed code implements RAJAPerf Suite (RPS) - based performance testing for Kokkos.
Main new code and changes include:
common/Executor.hpp: kernelID registerKernel(std::string, KernelBase*);
common/Executor.cpp: exec->registerKernel(groupName, kernel);
RAJAPerfSuiteDriver.cpp: //executor.registerKernel
Configure Kokkos build and runtime in RPS:
cmake
-DENABLE_KOKKOS=ON
-DENABLE_OPENMP=ON
-DCMAKE_BUILD_TYPE=Debug .. \
Please do not hesitate to contact @ajpowelsnl or @DavidPoliakoff for any questions and / or further discussion.