-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft: Catch2 Benchmarking #1723
base: develop
Are you sure you want to change the base?
Conversation
#if defined(ALPAKA_ACC_GPU_CUDA_ENABLED) && !BOOST_LANG_CUDA | ||
# error If ALPAKA_ACC_GPU_CUDA_ENABLED is set, the compiler has to support CUDA! | ||
#endif | ||
|
||
#if defined(ALPAKA_ACC_GPU_HIP_ENABLED) && !BOOST_LANG_HIP | ||
# error If ALPAKA_ACC_GPU_HIP_ENABLED is set, the compiler has to support HIP! | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dislike those. Can't we just have a prelude in alpaka.hpp
after BoostPredef
that checks those in one place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as it takes ALPAKA_HOST_ONLY
into account.
|
||
namespace alpaka::test | ||
{ | ||
//! The fixture for executing a kernel on a given accelerator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//! The fixture for executing a kernel on a given accelerator. | |
//! The fixture for benchmarking the execution of a kernel on a given accelerator. |
About the fixture - I don't think we can provide a universal benchmark fixture as we discussed earlier - i.e. one that would execute the kernel and pass some pre-allocated buffers which were set up in the user code benchmark cpp code (i.e. The issue is two-fold:
|
Are you still working on this @sliwowitz? |
Yes. I got stuck on the |
I checked the output options again. Last time, we had the problem, that the output was not machine readable but I found some documentation about the usage of the I tested your benchmark with $ build/ninja-omp2b-gcc-release/test/benchmark/rand/randBenchmark --reporter XML
<?xml version="1.0" encoding="UTF-8"?>
<Catch2TestRun name="randBenchmark" rng-seed="645286256" xml-format-version="2" catch2-version="3.3.2">
<TestCase name="defaultRandomGeneratorBenchmark" tags="[randBenchmark]" filename="/home/simeon/projects/alpaka/test/benchmark/rand/src/randBenchmark.cpp" line="53">
<BenchmarkResults name="Random sequence N=10" samples="100" resamples="100000" iterations="1" clockResolution="32.4883" estimatedDuration="8.6125e+06">
<!-- All values in nano seconds -->
<mean value="89822.5" lowerBound="85849.8" upperBound="103189" ci="0.95"/>
<standardDeviation value="33361.6" lowerBound="10991" upperBound="75389.8" ci="0.95"/>
<outliers variance="0.98889" lowMild="2" lowSevere="0" highMild="2" highSevere="2"/>
</BenchmarkResults>
<BenchmarkResults name="Random sequence N=100000" samples="100" resamples="100000" iterations="1" clockResolution="32.4883" estimatedDuration="7.2092e+06">
<!-- All values in nano seconds -->
<mean value="131106" lowerBound="97376" upperBound="287445" ci="0.95"/>
<standardDeviation value="317164" lowerBound="12744.5" upperBound="753666" ci="0.95"/>
<outliers variance="0.989974" lowMild="0" lowSevere="0" highMild="0" highSevere="2"/>
</BenchmarkResults>
<BenchmarkResults name="Random sequence N=1000000" samples="100" resamples="100000" iterations="1" clockResolution="32.4883" estimatedDuration="1.53628e+07">
<!-- All values in nano seconds -->
<mean value="229560" lowerBound="223253" upperBound="240870" ci="0.95"/>
<standardDeviation value="41958.1" lowerBound="25405.3" upperBound="79203.3" ci="0.95"/>
<outliers variance="0.935867" lowMild="11" lowSevere="0" highMild="0" highSevere="1"/>
</BenchmarkResults>
<BenchmarkResults name="Random sequence N=10000000" samples="100" resamples="100000" iterations="1" clockResolution="32.4883" estimatedDuration="1.02668e+08">
<!-- All values in nano seconds -->
<mean value="1.57844e+06" lowerBound="1.32217e+06" upperBound="2.17723e+06" ci="0.95"/>
<standardDeviation value="1.87999e+06" lowerBound="702312" upperBound="3.27425e+06" ci="0.95"/>
<outliers variance="0.989892" lowMild="0" lowSevere="0" highMild="1" highSevere="3"/>
</BenchmarkResults>
<BenchmarkResults name="Random sequence N=100000000" samples="100" resamples="100000" iterations="1" clockResolution="32.4883" estimatedDuration="1.00224e+09">
<!-- All values in nano seconds -->
<mean value="1.02198e+07" lowerBound="1.01508e+07" upperBound="1.03973e+07" ci="0.95"/>
<standardDeviation value="515800" lowerBound="116904" upperBound="994951" ci="0.95"/>
<outliers variance="0.484665" lowMild="2" lowSevere="0" highMild="1" highSevere="2"/>
</BenchmarkResults>
<BenchmarkResults name="Random sequence N=1000000000" samples="100" resamples="100000" iterations="1" clockResolution="32.4883" estimatedDuration="1.10758e+10">
<!-- All values in nano seconds -->
<mean value="1.04739e+08" lowerBound="1.03648e+08" upperBound="1.06501e+08" ci="0.95"/>
<standardDeviation value="6.91494e+06" lowerBound="4.89068e+06" upperBound="9.9287e+06" ci="0.95"/>
<outliers variance="0.625317" lowMild="2" lowSevere="0" highMild="0" highSevere="19"/>
</BenchmarkResults>
<OverallResult success="true" skips="0">
<StdOut>
Hardware threads: 64
temp debug normalized result = 18.7131 should probably converge to 0.5.Hardware threads: 64
temp debug normalized result = 18.7981 should probably converge to 0.5.Hardware threads: 64
temp debug normalized result = 9.672 should probably converge to 0.5.Hardware threads: 64
temp debug normalized result = 1.64295 should probably converge to 0.5.Hardware threads: 64
temp debug normalized result = 0.623814 should probably converge to 0.5.Hardware threads: 64
temp debug normalized result = 0.500023 should probably converge to 0.5.
</StdOut>
</OverallResult>
</TestCase>
<OverallResults successes="6" failures="0" expectedFailures="0" skips="0"/>
<OverallResultsCases successes="1" failures="0" expectedFailures="0" skips="0"/>
</Catch2TestRun> There is also a JSON reporter, but therefore we need to update catch2 (only a new minor version): catchorg/Catch2#2706 |
I'd vote for the JSON reporter as it could make the output both machine- and human-readable :-) |
In general, I also prefer JSON because it is more readable. But we should do at least a short test, if XML and JSON provides the same amount of information. For example, the XML output uses comments to store the information that the time was measured in nano seconds. |
JSON reporter is currently not working. It does not contain the benchmark results, the reporter is currently experimental and not fully implemented. |
This is an example of using Catch2 facilities for benchmarking.
Putting this into Draft mode, since it's still WIP. It compiles, runs, but returns a wrong result, and probably also measures stuff we don't really want to measure, but I want this out so others can share their comments.
I had to create another fixture for the benchmarks, based on the earlier
KernelExecutionFixture
. I thought about inheritance - it didn't work out for me on the first try, but maybe there's a way.One catch with Catch2 benchmarks is that internally it runs the
BENCHMARK
marked code many times first to estimate the runtime, and the collect enough data for meaningful statistics (this they call iterations, and it can't be changed without modifying Catch2 sources). This is why myKernelExecutionBenchmarkFixture
first sets the memory up (a potentially lengthy operation depending on what we want to measure in the next step) outside theBENCHMARK
area. Inside theBENCHMARK
, the memory is cleared/memset/whatever, because that part will be re-run multiple times. After resetting the memory, there is ameter.measure([&]{...});
call which encapsulates the part ofBENCHMARK
that is actually to be measured.You can build the benchmarks with
alpaka_BUILD_BENCHMARK=ON
. The executable will live intest/benchmark/rand/randBenchmark
. If you run it, it will collect 100 samples that is - it will run each benchmark100*i
times, wherei
is the number of iterations auto-estimated by Catch2 - it should be something between 1-3. If you just want to see whether the benchmarks run, you can pass a parameter on the command line:test/benchmark/rand/randBenchmark --benchmark-samples=1
(benchmark-samples=1 is also set if running in CI).Known issues:
debug temp
in the output) which should all be around 0.5 are actually not.cpp
.