-
Notifications
You must be signed in to change notification settings - Fork 401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid implicit conversions for bitwise operators #2708
base: master
Are you sure you want to change the base?
Avoid implicit conversions for bitwise operators #2708
Conversation
d3732a7
to
bedefc1
Compare
#define UNARY_BITWISE_OPERATOR_FUNCTOR(NAME, OP) \ | ||
struct NAME \ | ||
{ \ | ||
template <class A1> \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: is the name A1
something that was used in similar bits of codes. If not, I would find e.g. T
more logical (or even A
or E
which is used in many places)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing UNARY_OPERATOR_FUNCTOR
indeed also uses A1
. I'll happily change both to T
, A
, or E
.
return OP arg; \ | ||
} \ | ||
template <class B> \ | ||
constexpr auto simd_apply(const B& arg) const \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question out of ignorance: Why don't you use std::decay_t<B>
here.
Style: please use the same template name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using std::decay_t<B>
indeed is consistent with the return type of operator()
.
Style: Do you mean I should use the same template argument name in operator()
and simd_apply
, e.g., use template <class T>
in both cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be great, yes!
constexpr std:: \ | ||
conditional_t<(sizeof(std::decay_t<T1>) > sizeof(std::decay_t<T2>)), std::decay_t<T1>, std::decay_t<T2>> \ | ||
operator()(T1&& arg1, T2&& arg2) const \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea what happens here: the opening and closing (..)
and <..>
don't even seem to match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tricky part of this expression is that one of the >
is a greater-than sign:
The first argument to std::condition_t
is sizeof(std::decay_t<T1>) > sizeof(std::decay_t<T2>)
. Because of the greater-than sign, I'm using extra parentheses around it. I can probably replace it by std::greater
, however, I'm afraid the expression will mainly get longer and doesn't become more readable.
The other arguments to std::condition_t
are std::decay_t<T1>
and std::decay_t<T2>
. So, the conditional selects the largest type of T1 and T2. A (bitwise) operator on two char
s will now yield a char
, instead of an int
. Combining a short
and a long
yields a long
.
Combining two equally-sized unsigned and signed types currently yields the second type, e.g, combining uint16_t
and int16_t
yields an int16_t
return value. For bitwise operations, this shouldn't be a problem. Note that I can easily make it return the first type using >=
instead of >
when comparing the sizes.
For other operations, like addition or subtraction, I can imagine we have to follow the C++ rules closer. I found the following results when using decltype
with g++ 11.3:
-
decltype(int8_t() + int8_t()) ->
int
-
decltype(uint8_t() + uint8_t()) -> int`
-
decltype(int8_t() + uint8_t()) ->
int
-
decltype(uint8_t() + int8_t()) ->
int
-
(u)int16_t: Same result as (u)int8_t: Always
int
. -
decltype(int32_t() + int32_t()) ->
int32_t
-
decltype(uint32_t() + uint32_t()) -> uint32_t`
-
decltype(int32_t() + uint32_t()) ->
uint32_t
-
decltype(uint32_t() + int32_t()) ->
uint32_t
-
decltype(int32_t() + int64_t()) ->
int64_t
-
decltype(uint32_t() + uint64_t()) -> uint64_t`
-
decltype(int32_t() + uint64_t()) ->
uint64_t
-
decltype(uint32_t() + int64_t()) ->
int64_t
-
decltype(int32_t() + float()) ->
float
-
decltype(uint32_t() + float()) ->
float
-
decltype(float() + uint32_t()) ->
float
-
decltype(float() + int32_t()) ->
float
-
decltype(int64_t() + float()) ->
float
(even thoughsizeof(float)
(4 bytes) is smaller thansizeof(int64_t)
! -
decltype(uint64_t() + float()) ->
float
-
decltype(float() + uint64_t()) ->
float
-
decltype(float() + int64_t()) ->
float
Perhaps the following stategy will work for all binary operations:
- If the input types are equal, use that as the return type.
- If both types are floating point types, return the largest type.
- If only one of the types is a floating point type, return the floating point type.
- If the type sizes differ, return the largest type (which can be signed).
- If the type sizes are equal and they only differ in signedness, return the unsigned type.
Code similar to simd_return_type_impl
in XSimd could implement this return type strategy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for the clarification!! I'm just not sure about the latter case: stripping the signedness is dangerous here no?
#define BINARY_BITWISE_OPERATOR_FUNCTOR(NAME, OP) \ | ||
struct NAME \ | ||
{ \ | ||
template <class T1, class T2> \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use T
and S
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing BINARY_OPERATOR_FUNCTOR
uses T1
and T2
, too. Using T1
and T2
as names reflects the fact that the types are typically similar or the same, so I rather stick to T1
and T2
.
using xt::detail::operator OP; \ | ||
return (std::forward<T1>(arg1) OP std::forward<T2>(arg2)); \ | ||
} \ | ||
template <class B> \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you allow the two types here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing BINARY_OPERATOR_FUNCTOR
also has a single template argument for simd_apply
. I don't know why. Using two arguments seems more logical, indeed.
Thanks. This is not my expertise, but I do already have some questions. |
@tdegeus the checks on this appear to be stuck burning up compute time. |
This PR partially addresses #2707.
When using XSimd, the following code will compile, however, the resulting code is inefficient if the return type of
operator|(const char&, const char&)
isint
instead ofchar
. [C++ allows implicitly converting/promotingchar
toint
.](https://en.cppreference.com/w/cpp/language/implicit_conversion).
The generated assembly code performs the operation using 32-bit integers and contains many pack and unpack instructions around it. Supplying explict return types in the functors instead of 'auto' avoids those pack and unpack instructions and has a single 'vpor' instruction (when using avx2) that or's all 16 values at once.
Although this improvement works for small integral values like
char
andunsigned short
, usingbool
still results in inefficient code sincext_simd::simd_condition<bool, bool>::value
is false.xsimd::detail::simd_condition
explicitly is false when both types arebool
, for unknown reasons.I have only applied the improvement to bitwise operators, since it's probably not safe for other operators. For example, adding characters may overflow so there the implicit conversion to
int
is useful. On the other side, addingint
s themselves could also overflow and they are not promoted to other types. Perhaps this optimization is possible for all operators.