Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Build System, Source Tree #27

Merged
merged 22 commits into from
Nov 4, 2024
Merged
Changes from 1 commit
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
6795200
Change name of benchmark directory
itzmeanjan Oct 29, 2024
acac202
Split a large Makefile into multiple smaller ones
itzmeanjan Oct 29, 2024
91b3da2
Change directory structure of library source tree
itzmeanjan Oct 29, 2024
2dc4e4d
Update tests to work with new directory structure of `sha3` headers
itzmeanjan Oct 29, 2024
b66c67f
Update benchmarks to use new directory structure of `sha3` headers
itzmeanjan Oct 29, 2024
7559110
Refactor examples; make them easy to run from CLI
itzmeanjan Oct 29, 2024
eeaa963
Update Github actions CI script to run tests and examples on Github p…
itzmeanjan Oct 29, 2024
6768d59
Minor code refactoring
itzmeanjan Oct 30, 2024
9516d21
Increase columnlimit to 120 in clang-format style spec. file
itzmeanjan Oct 30, 2024
551e97c
Apply permutation after zeroizing in ratchet operation
itzmeanjan Oct 31, 2024
502fd00
Use less verbose pattern substitution in Makefile
itzmeanjan Oct 31, 2024
587e97c
Introduce compiler attribute based force-inline MACRO definition
itzmeanjan Oct 31, 2024
df43bdf
Reduce min. warmup time to 50ms when running benchmarks
itzmeanjan Nov 3, 2024
f977615
Add benchmark results JSON file to vcs
itzmeanjan Nov 3, 2024
ae03272
Simplify README file - reduce clutter
itzmeanjan Nov 3, 2024
b1e55c0
Add ability to generate help text for Makefile commands
itzmeanjan Nov 4, 2024
a412bba
Mention all new Makefile targets in README file
itzmeanjan Nov 4, 2024
dfdb203
Use correct command to run tests on Github Actions CI
itzmeanjan Nov 4, 2024
8ead77e
Add link to source of inspiration for Makefile target documentation
itzmeanjan Nov 4, 2024
c82d3e7
Remove unnecessary backtick
itzmeanjan Nov 4, 2024
5296725
Add benchmark results in JSON format for more targets
itzmeanjan Nov 4, 2024
b235bc9
Merge branch 'master' into refactor-makefile
itzmeanjan Nov 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Increase columnlimit to 120 in clang-format style spec. file
Signed-off-by: Anjan Roy <[email protected]>
itzmeanjan committed Oct 30, 2024

Verified

This commit was signed with the committer’s verified signature.
itzmeanjan Anjan Roy
commit 9516d21e618c33a6c43311e6e7b159c1bf2b30d4
2 changes: 1 addition & 1 deletion .clang-format
Original file line number Diff line number Diff line change
@@ -81,7 +81,7 @@ BreakBeforeTernaryOperators: true
BreakConstructorInitializers: BeforeComma
BreakInheritanceList: BeforeComma
BreakStringLiterals: true
ColumnLimit: 80
ColumnLimit: 120
CommentPragmas: '^ IWYU pragma:'
CompactNamespaces: false
ConstructorInitializerIndentWidth: 2
9 changes: 2 additions & 7 deletions benches/bench_common.hpp
Original file line number Diff line number Diff line change
@@ -4,13 +4,8 @@
#include <span>
#include <vector>

const auto compute_min = [](const std::vector<double>& v) -> double {
return *std::min_element(v.begin(), v.end());
};

const auto compute_max = [](const std::vector<double>& v) -> double {
return *std::max_element(v.begin(), v.end());
};
const auto compute_min = [](const std::vector<double>& v) -> double { return *std::min_element(v.begin(), v.end()); };
const auto compute_max = [](const std::vector<double>& v) -> double { return *std::max_element(v.begin(), v.end()); };

// Generates N -many random values of type T | N >= 0
template<typename T>
6 changes: 2 additions & 4 deletions benches/bench_xof.cpp
Original file line number Diff line number Diff line change
@@ -3,8 +3,7 @@
#include "sha3/shake256.hpp"
#include <benchmark/benchmark.h>

// Benchmarks SHAKE-128 extendable output function with variable length input
// and squeezed output.
// Benchmarks SHAKE-128 extendable output function with variable length input and squeezed output.
//
// Note, all input bytes are absorbed in a single call to `absorb` function.
// And all output bytes are squeezed in a single call to `squeeze` function.
@@ -39,8 +38,7 @@ bench_shake128(benchmark::State& state)
#endif
}

// Benchmarks SHAKE-256 extendable output function with variable length input
// and squeezed output.
// Benchmarks SHAKE-256 extendable output function with variable length input and squeezed output.
//
// Note, all input bytes are absorbed in a single call to `absorb` function.
// And all output bytes are squeezed in a single call to `squeeze` function.
3 changes: 1 addition & 2 deletions examples/example_helper.hpp
Original file line number Diff line number Diff line change
@@ -21,8 +21,7 @@ random_data(std::span<T> data)
}
}

// Given a bytearray of length N, this function converts it to human readable
// hex string of length N << 1 | N >= 0
// Given a bytearray of length N, this function converts it to human readable hex string of length N << 1 | N >= 0
static inline std::string
to_hex(std::span<const uint8_t> bytes)
{
114 changes: 49 additions & 65 deletions include/sha3/internals/keccak.hpp
Original file line number Diff line number Diff line change
@@ -18,8 +18,7 @@ static constexpr size_t LANE_BW = 1ul << L;
static constexpr size_t STATE_BIT_LEN = 1600;

// Byte length of Keccak-p[1600, 24] permutation state
static constexpr size_t STATE_BYTE_LEN =
STATE_BIT_LEN / std::numeric_limits<uint8_t>::digits;
static constexpr size_t STATE_BYTE_LEN = STATE_BIT_LEN / std::numeric_limits<uint8_t>::digits;

// # -of lanes ( each of 64 -bit width ) in Keccak-p[1600, 24] state
static constexpr size_t LANE_CNT = STATE_BIT_LEN / LANE_BW;
@@ -28,41 +27,36 @@ static constexpr size_t LANE_CNT = STATE_BIT_LEN / LANE_BW;
// s.t. b = 1600, w = b/ 25, l = log2(w), nr = 12 + 2l
static constexpr size_t ROUNDS = 12 + 2 * L;

// Leftwards circular rotation offset of 25 lanes ( each lane is 64 -bit wide )
// of state array, as provided in table 2 below algorithm 2 in section 3.2.2 of
// https://dx.doi.org/10.6028/NIST.FIPS.202
// Leftwards circular rotation offset of 25 lanes ( each lane is 64 -bit wide ) of state array, as provided in table 2
// below algorithm 2 in section 3.2.2 of https://dx.doi.org/10.6028/NIST.FIPS.202
//
// Note, following offsets are obtained by performing % 64 ( bit width of lane )
// on offsets provided in above mentioned link
static constexpr size_t ROT[LANE_CNT]{
0 % LANE_BW, 1 % LANE_BW, 190 % LANE_BW, 28 % LANE_BW, 91 % LANE_BW,
36 % LANE_BW, 300 % LANE_BW, 6 % LANE_BW, 55 % LANE_BW, 276 % LANE_BW,
3 % LANE_BW, 10 % LANE_BW, 171 % LANE_BW, 153 % LANE_BW, 231 % LANE_BW,
105 % LANE_BW, 45 % LANE_BW, 15 % LANE_BW, 21 % LANE_BW, 136 % LANE_BW,
210 % LANE_BW, 66 % LANE_BW, 253 % LANE_BW, 120 % LANE_BW, 78 % LANE_BW
};

// Precomputed table used for looking up source index during application of π
// step mapping function on keccak-[1600, 24] state
// Note, following offsets are obtained by performing % 64 ( bit width of lane ) on offsets provided in above mentioned
// link
static constexpr size_t ROT[LANE_CNT]{ 0 % LANE_BW, 1 % LANE_BW, 190 % LANE_BW, 28 % LANE_BW, 91 % LANE_BW,
36 % LANE_BW, 300 % LANE_BW, 6 % LANE_BW, 55 % LANE_BW, 276 % LANE_BW,
3 % LANE_BW, 10 % LANE_BW, 171 % LANE_BW, 153 % LANE_BW, 231 % LANE_BW,
105 % LANE_BW, 45 % LANE_BW, 15 % LANE_BW, 21 % LANE_BW, 136 % LANE_BW,
210 % LANE_BW, 66 % LANE_BW, 253 % LANE_BW, 120 % LANE_BW, 78 % LANE_BW };

// Precomputed table used for looking up source index during application of π step mapping function on keccak-[1600, 24]
// state
//
// print('to <= from')
// for y in range(5):
// for x in range(5):
// print(f'{y * 5 + x} <= {x * 5 + (x + 3 * y) % 5}')
//
// Table generated using above Python code snippet. See section 3.2.3 of the
// specification https://dx.doi.org/10.6028/NIST.FIPS.202
static constexpr size_t PERM[LANE_CNT]{ 0, 6, 12, 18, 24, 3, 9, 10, 16,
22, 1, 7, 13, 19, 20, 4, 5, 11,
17, 23, 2, 8, 14, 15, 21 };

// Computes single bit of Keccak-p[1600, 24] round constant ( at compile-time ),
// using binary LFSR, defined by primitive polynomial x^8 + x^6 + x^5 + x^4 + 1
// Table generated using above Python code snippet. See section 3.2.3 of the specification
// https://dx.doi.org/10.6028/NIST.FIPS.202
static constexpr size_t PERM[LANE_CNT]{ 0, 6, 12, 18, 24, 3, 9, 10, 16, 22, 1, 7, 13,
19, 20, 4, 5, 11, 17, 23, 2, 8, 14, 15, 21 };

// Computes single bit of Keccak-p[1600, 24] round constant ( at compile-time ), using binary LFSR, defined by primitive
// polynomial x^8 + x^6 + x^5 + x^4 + 1
//
// See algorithm 5 in section 3.2.5 of http://dx.doi.org/10.6028/NIST.FIPS.202
//
// Taken from
// https://github.com/itzmeanjan/elephant/blob/2a21c7e/include/keccak.hpp#L24-L59
// Taken from https://github.com/itzmeanjan/elephant/blob/2a21c7e/include/keccak.hpp#L24-L59
consteval static bool
rc(const size_t t)
{
@@ -73,8 +67,7 @@ rc(const size_t t)

// step 2 of algorithm 5
//
// note, step 3.a of algorithm 5 is also being
// executed in this statement ( for first iteration, with i = 1 ) !
// note, step 3.a of algorithm 5 is also being executed in this statement ( for first iteration, with i = 1 ) !
uint16_t r = 0b10000000;

// step 3 of algorithm 5
@@ -88,19 +81,17 @@ rc(const size_t t)

// step 3.f of algorithm 5
//
// note, this statement also executes step 3.a for upcoming
// iterations ( i.e. when i > 1 )
// note, this statement also executes step 3.a for upcoming iterations ( i.e. when i > 1 )
r >>= 1;
}

return static_cast<bool>((r >> 7) & 1);
}

// Computes 64 -bit round constant ( at compile-time ), which is XOR-ed into
// very first lane ( = lane(0, 0) ) of Keccak-p[1600, 24] permutation state
// Computes 64 -bit round constant ( at compile-time ), which is XOR-ed into very first lane ( = lane(0, 0) ) of
// Keccak-p[1600, 24] permutation state
//
// Taken from
// https://github.com/itzmeanjan/elephant/blob/2a21c7e/include/keccak.hpp#L61-L74
// Taken from https://github.com/itzmeanjan/elephant/blob/2a21c7e/include/keccak.hpp#L61-L74
consteval static uint64_t
compute_rc(const size_t r_idx)
{
@@ -126,16 +117,15 @@ compute_rcs()
return res;
}

// Round constants to be XORed with lane (0, 0) of keccak-p[1600, 24]
// permutation state, see section 3.2.5 of
// Round constants to be XORed with lane (0, 0) of keccak-p[1600, 24] permutation state, see section 3.2.5 of
// https://dx.doi.org/10.s6028/NIST.FIPS.202
static constexpr auto RC = compute_rcs();

#if defined __APPLE__ && defined __aarch64__ // On Apple Silicon

// Keccak-p[1600, 24] round function, applying all five step mapping functions,
// updating state array. Note this implementation of round function applies four
// consecutive rounds in a single call i.e. if you invoke it to apply round `i`
// Keccak-p[1600, 24] round function, applying all five step mapping functions, updating state array. Note this
// implementation of round function applies four consecutive rounds in a single call i.e. if you invoke it to apply
// round `i`
//
// - it first applies round `i`
// - then round `i+1`
@@ -144,9 +134,8 @@ static constexpr auto RC = compute_rcs();
//
// See section 3.3 of https://dx.doi.org/10.6028/NIST.FIPS.202
//
// This Keccak round function implementation is specifically targeting Apple
// Silicon CPUs. And this implementation collects a lot of inspiration from
// https://github.com/bwesterb/armed-keccak.git.
// This Keccak round function implementation is specifically targeting Apple Silicon CPUs. And this implementation
// collects a lot of inspiration from https://github.com/bwesterb/armed-keccak.git.
static inline constexpr void
roundx4(uint64_t* const state, const size_t ridx)
{
@@ -155,15 +144,13 @@ roundx4(uint64_t* const state, const size_t ridx)

// Round ridx + 0
#if defined __clang__
// Following
// https://clang.llvm.org/docs/LanguageExtensions.html#extensions-for-loop-hint-optimizations
// Following https://clang.llvm.org/docs/LanguageExtensions.html#extensions-for-loop-hint-optimizations

#pragma clang loop unroll(enable)
#pragma clang loop vectorize(enable)
#pragma clang loop interleave(enable)
#elif defined __GNUG__
// Following
// https://gcc.gnu.org/onlinedocs/gcc/Loop-Specific-Pragmas.html#Loop-Specific-Pragmas
// Following https://gcc.gnu.org/onlinedocs/gcc/Loop-Specific-Pragmas.html#Loop-Specific-Pragmas

#pragma GCC unroll 5
#pragma GCC ivdep
@@ -596,8 +583,8 @@ roundx4(uint64_t* const state, const size_t ridx)

#else // On everywhere else

// Keccak-p[1600, 24] step mapping function θ, see section 3.2.1 of SHA3
// specification https://dx.doi.org/10.6028/NIST.FIPS.202
// Keccak-p[1600, 24] step mapping function θ, see section 3.2.1 of SHA3 specification
// https://dx.doi.org/10.6028/NIST.FIPS.202
static inline constexpr void
theta(uint64_t* const state)
{
@@ -643,8 +630,8 @@ theta(uint64_t* const state)
}
}

// Keccak-p[1600, 24] step mapping function ρ, see section 3.2.2 of SHA3
// specification https://dx.doi.org/10.6028/NIST.FIPS.202
// Keccak-p[1600, 24] step mapping function ρ, see section 3.2.2 of SHA3 specification
// https://dx.doi.org/10.6028/NIST.FIPS.202
static inline constexpr void
rho(uint64_t* const state)
{
@@ -661,8 +648,8 @@ rho(uint64_t* const state)
}
}

// Keccak-p[1600, 24] step mapping function π, see section 3.2.3 of SHA3
// specification https://dx.doi.org/10.6028/NIST.FIPS.202
// Keccak-p[1600, 24] step mapping function π, see section 3.2.3 of SHA3 specification
// https://dx.doi.org/10.6028/NIST.FIPS.202
static inline constexpr void
pi(const uint64_t* const __restrict istate, // input permutation state
uint64_t* const __restrict ostate // output permutation state
@@ -681,8 +668,8 @@ pi(const uint64_t* const __restrict istate, // input permutation state
}
}

// Keccak-p[1600, 24] step mapping function χ, see section 3.2.4 of SHA3
// specification https://dx.doi.org/10.6028/NIST.FIPS.202
// Keccak-p[1600, 24] step mapping function χ, see section 3.2.4 of SHA3 specification
// https://dx.doi.org/10.6028/NIST.FIPS.202
static inline constexpr void
chi(uint64_t* const state)
{
@@ -708,18 +695,17 @@ chi(uint64_t* const state)
}
}

// Keccak-p[1600, 24] step mapping function ι, see section 3.2.5 of SHA3
// specification https://dx.doi.org/10.6028/NIST.FIPS.202
// Keccak-p[1600, 24] step mapping function ι, see section 3.2.5 of SHA3 specification
// https://dx.doi.org/10.6028/NIST.FIPS.202
static inline constexpr void
iota(uint64_t* const state, const size_t ridx)
{
state[0] ^= RC[ridx];
}

// Keccak-p[1600, 24] round function, which applies all five step mapping
// functions in order, updates state array. Note this implementation of round
// function applies two consecutive rounds in a single call i.e. if you invoke
// it to apply round `i` - it first applies round `i` and then round `i+1`.
// Keccak-p[1600, 24] round function, which applies all five step mapping functions in order, updates state array. Note
// this implementation of round function applies two consecutive rounds in a single call i.e. if you invoke it to apply
// round `i` - it first applies round `i` and then round `i+1`.
//
// See section 3.3 of https://dx.doi.org/10.6028/NIST.FIPS.202
static inline constexpr void
@@ -744,10 +730,8 @@ roundx2(uint64_t* const state, const size_t ridx)

#endif

// Keccak-p[1600, 24] permutation, applying 24 rounds of permutation
// on state of dimension 5 x 5 x 64 ( = 1600 ) -bits, using algorithm 7
// defined in section 3.3 of SHA3 specification
// https://dx.doi.org/10.6028/NIST.FIPS.202
// Keccak-p[1600, 24] permutation, applying 24 rounds of permutation on state of dimension 5 x 5 x 64 ( = 1600 ) -bits,
// using algorithm 7 defined in section 3.3 of SHA3 specification https://dx.doi.org/10.6028/NIST.FIPS.202
inline constexpr void
permute(uint64_t state[LANE_CNT])
{
52 changes: 19 additions & 33 deletions include/sha3/internals/sponge.hpp
Original file line number Diff line number Diff line change
@@ -11,35 +11,28 @@
// Keccak family of sponge functions
namespace sponge {

// Compile-time check to ensure that domain separator can only be 2/ 4 -bits
// wide.
// Compile-time check to ensure that domain separator can only be 2/ 4 -bits wide.
//
// When used in context of extendable output functions ( SHAKE{128, 256} )
// domain separator bits are 4 -bit wide.
// When used in context of extendable output functions ( SHAKE{128, 256} ) domain separator bits are 4 -bit wide.
//
// See section 6.{1, 2} of SHA3 specification
// https://dx.doi.org/10.6028/NIST.FIPS.202
// See section 6.{1, 2} of SHA3 specification https://dx.doi.org/10.6028/NIST.FIPS.202
constexpr bool
check_domain_separator(const size_t dom_sep_bit_len)
{
return (dom_sep_bit_len == 2u) | (dom_sep_bit_len == 4u);
}

// Pad10*1 - generates a padding, while also considering domain separator bits (
// which are either 2 or 4 -bit wide ), such that when both domain separator
// bits and 10*1 padding is appended ( in order ) to actual message, total byte
// length of message consumed into keccak-p[1600, 24] permutation becomes
// multiple of `rate` -bits. The only parameter `offset` denotes how many bytes
// are already mixed with rate portion of permutation state meaning `offset`
// must ∈ [0, `rate/ 8`). This routine returns a byte array of length `rate/ 8`
// -bytes which can safely be mixed into permutation state duing sponge
// finalization phase.
// Pad10*1 - generates a padding, while also considering domain separator bits ( which are either 2 or 4 -bit wide ),
// such that when both domain separator bits and 10*1 padding is appended ( in order ) to actual message, total byte
// length of message consumed into keccak-p[1600, 24] permutation becomes multiple of `rate` -bits. The only parameter
// `offset` denotes how many bytes are already mixed with rate portion of permutation state meaning `offset` must ∈ [0,
// `rate/ 8`). This routine returns a byte array of length `rate/ 8` -bytes which can safely be mixed into permutation
// state duing sponge finalization phase.
//
// This function implementation collects motivation from
// https://github.com/itzmeanjan/turboshake/blob/e1a6b950/src/sponge.rs#L70-L72
template<uint8_t domain_separator, size_t ds_bits, size_t rate>
static inline constexpr std::array<uint8_t,
rate / std::numeric_limits<uint8_t>::digits>
static inline constexpr std::array<uint8_t, rate / std::numeric_limits<uint8_t>::digits>
pad10x1(const size_t offset)
requires(check_domain_separator(ds_bits))
{
@@ -54,9 +47,8 @@ pad10x1(const size_t offset)
return res;
}

// Given `mlen` (>=0) -bytes message, this routine consumes it into Keccak[c]
// permutation state s.t. `offset` ( second parameter ) denotes how many bytes
// are already consumed into rate portion of the state.
// Given `mlen` (>=0) -bytes message, this routine consumes it into Keccak[c] permutation state s.t. `offset` ( second
// parameter ) denotes how many bytes are already consumed into rate portion of the state.
//
// - `rate` portion of sponge will have bitwidth of 1600 - c.
// - `offset` must ∈ [0, `rbytes`).
@@ -65,9 +57,7 @@ pad10x1(const size_t offset)
// https://github.com/itzmeanjan/turboshake/blob/e1a6b950/src/sponge.rs#L4-L56
template<size_t rate>
static inline constexpr void
absorb(uint64_t state[keccak::LANE_CNT],
size_t& offset,
std::span<const uint8_t> msg)
absorb(uint64_t state[keccak::LANE_CNT], size_t& offset, std::span<const uint8_t> msg)
{
constexpr size_t rbytes = rate >> 3u; // # -of bytes
constexpr size_t rwords = rbytes >> 3u; // # -of 64 -bit words
@@ -118,11 +108,9 @@ absorb(uint64_t state[keccak::LANE_CNT],
offset += rm_bytes;
}

// Given that N message bytes are already consumed into Keccak[c] permutation
// state, this routine finalizes sponge state and makes it ready for squeezing,
// by appending ( along with domain separation bits ) 10*1 padding bits to input
// message s.t. total absorbed message byte length becomes multiple of
// `rate/ 8` -bytes.
// Given that N message bytes are already consumed into Keccak[c] permutation state, this routine finalizes sponge state
// and makes it ready for squeezing, by appending ( along with domain separation bits ) 10*1 padding bits to input
// message s.t. total absorbed message byte length becomes multiple of `rate/ 8` -bytes.
//
// - `rate` portion of sponge will have bitwidth of 1600 - c.
// - `offset` must ∈ [0, `rbytes`)
@@ -153,8 +141,8 @@ finalize(uint64_t state[keccak::LANE_CNT], size_t& offset)
offset = 0;
}

// Given that Keccak[c] permutation state is finalized, this routine can be
// invoked for squeezing `olen` -bytes out of rate portion of the state.
// Given that Keccak[c] permutation state is finalized, this routine can be invoked for squeezing `olen` -bytes out of
// rate portion of the state.
//
// - `rate` portion of sponge will have bitwidth of 1600 - c.
// - `squeezable` denotes how many bytes can be squeezed without permutating the
@@ -166,9 +154,7 @@ finalize(uint64_t state[keccak::LANE_CNT], size_t& offset)
// https://github.com/itzmeanjan/turboshake/blob/e1a6b950/src/sponge.rs#L83-L118
template<size_t rate>
static inline constexpr void
squeeze(uint64_t state[keccak::LANE_CNT],
size_t& squeezable,
std::span<uint8_t> out)
squeeze(uint64_t state[keccak::LANE_CNT], size_t& squeezable, std::span<uint8_t> out)
{
constexpr size_t rbytes = rate >> 3u; // # -of bytes
constexpr size_t rwords = rbytes >> 3u; // # -of 64 -bit words
Loading