-
Notifications
You must be signed in to change notification settings - Fork 260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to implement an expand function / scatter for masks? #975
Comments
Mmmh, that's an interesting scenario. It's going to be quite difficult to implement this in an efficient generic manner, but I understand the pattern. Let's discuss the expected API. Following the AVX512 intrinsics, à la xsimd, that could be
The main problem probably is that we're likely to require shuffles to implement this operation, and we only have shuffle with constant masks, so we actually can provide a generic
Unfortunately, a constant mask probably has very little use... |
@serge-sans-paille, thanks for your reply!
I am not sure about the purpose of
That's what I was wondering about as well. However, for me it doesn't feel right since |
After some extra thoughts: But wait, we already have a generic swizzle, and an expand with zeroes can be expressed in terms of swizzle, I'll give it a try! |
Oh, you are right. I was looking at the maskz commands - my bad. Looking forward to seeing what you come up with:) Personally, I struggled to see how I could use |
mmh the example you give is not an expand, it's just a bit mask, and you just need to |
I think our discussion finally converges towards the same problem. My initial question was about
Obviously, my translation of
So my question is how to implement this in the generic case when this intrinsic is not available? Is it possible to do this w/o a raw for-loop? At the same time, this question becomes a feature request for xsimd:) |
Can you check if the code in #977 matches your performance expectation (on non - AVX512 targets, that is)? |
The previous implementation relied on a raw for-loop that compiles an index sequence that was subsequently fed into a combination of select and swizzle. The new version doesn't need the latter two steps but compiles the result directly in the for-loop.
Your solution works. Thanks:) However, I have pruned your solution a bit. What do you think about #978 ? |
Also provide a specialization for avx512f. Related to #975
* Generic, simple implementation fox xsimd::compress Related to #975 * fixup! Generic, simple implementation fox xsimd::compress * fixup! Generic, simple implementation fox xsimd::compress
Also provide a specialization for avx512f. Related to #975
Also provide a specialization for avx512f. Related to #975
Also provide a specialization for avx512f. Related to #975
Also provide a specialization for avx512f. Related to #975
How can I implement an efficient "expand" function that behaves like the family of
*_mask_expand_epi*
instrinsics of the AVX-512 instruction set? As gather/scatter already got their own counterparts in xsimd's API, is it worth doing the same for compress/expand?The text was updated successfully, but these errors were encountered: