How to implement an expand function / scatter for masks? #975

avitase · 2023-11-17T19:36:43Z

How can I implement an efficient "expand" function that behaves like the family of *_mask_expand_epi* instrinsics of the AVX-512 instruction set? As gather/scatter already got their own counterparts in xsimd's API, is it worth doing the same for compress/expand?

The text was updated successfully, but these errors were encountered:

serge-sans-paille · 2023-11-18T23:48:52Z

Mmmh, that's an interesting scenario. It's going to be quite difficult to implement this in an efficient generic manner, but I understand the pattern. Let's discuss the expected API. Following the AVX512 intrinsics, à la xsimd, that could be

xsimd::batch<T, A> xsimd::expand(xsimd::batch<T, A> self, xsimd::batch_bool<T, A> mask, xsimd::batch<T, A> filler);

The main problem probably is that we're likely to require shuffles to implement this operation, and we only have shuffle with constant masks, so we actually can provide a generic

xsimd::batch<T, A> xsimd::expand(xsimd::batch<T, A> self, xsimd::batch_bool_constant<T, A> mask, xsimd::batch<T, A> filler);

Unfortunately, a constant mask probably has very little use...

avitase · 2023-11-19T14:15:40Z

@serge-sans-paille, thanks for your reply!

xsimd::batch<T, A> xsimd::expand(xsimd::batch<T, A> self, xsimd::batch_bool<T, A> mask, xsimd::batch<T, A> filler);

I am not sure about the purpose of filler. Naively, I would expect that mask indicates what part of self should be distributed into an empty array. Do you propose to do this operation in filler instead?

The main problem probably is that we're likely to require shuffles to implement this operation

That's what I was wondering about as well. However, for me it doesn't feel right since expand is simpler/less powerful than shuffle or scatter because it does not reorder or even duplicate and shouldn't be implemented in terms of these more costly/complex operations. However, if no hardware support is available, how bad would be a generic fallback with a raw for loop? The same holds for compress...

serge-sans-paille · 2023-11-20T06:57:15Z

*mask_expand_epi* has a filler argument that can be used to fill the empty parts of the array. I'm fine with having the fillear always be zero if it makes the generic function easier / more efficient to write.

After some extra thoughts:
expand with a static mask is just a specialization of a single shuffle. Not super fancy but why not. The generic gather relies on a sequence of generic inserts, not super fancy either :-/

But wait, we already have a generic swizzle, and an expand with zeroes can be expressed in terms of swizzle, I'll give it a try!

avitase · 2023-11-20T10:10:40Z

Oh, you are right. I was looking at the maskz commands - my bad. Looking forward to seeing what you come up with:) Personally, I struggled to see how I could use swizzle to enumerate the mask, e.g., let's say we want to expand a b c d e f g according to the mask 0 0 1 0 1 1 0 to get 0 0 a 0 b c 0. The mask for swizzle then has to be ? ? 0 ? 1 ? 2 3 ? where the ? could be set to any value as the final result has to be selected by the original mask anyhow. The question is, how to do this enumeration/accumulation/scan of the 1s in 0 0 1 0 1 1 0 to get ? ? 0 ? 1 2 ? w/o expand on an index sequence? 🐔🥚 If the answer is a raw for-loop then this solution is no better than implementing expand with such a loop in the first place but maybe I am missing something here.

serge-sans-paille · 2023-11-20T21:27:42Z

mmh the example you give is not an expand, it's just a bit mask, and you just need to bitwise_and your data and the mask to get the expected result. In your example, an expand would yield a b c 0 0 0 0, and indeed a swizzle is not the answer.

avitase · 2023-11-21T09:03:19Z

I think our discussion finally converges towards the same problem. My initial question was about *mask{z}_expand_* as defined by Intel Intrinsics Guide or in APL (with a subsequent truncation). Translating the pseudo-code from the Intel guide to Python I get something like this:

def _mm_maskz_expand_epi8(k, a):
    N = len(k)
    dst = [0] * N

    m = 0
    for i in range(N):
        if k[i]:
            dst[i] = a[m]
            m += 1

    return dst

Obviously, my translation of N = len(k) is not quite right, however, for my previous example, this indeed yields:

>>> _mm_maskz_expand_epi8([False, False, True, False, True, True, False], "abcdefg")
[0, 0, 'a', 0, 'b', 'c', 0]

So my question is how to implement this in the generic case when this intrinsic is not available? Is it possible to do this w/o a raw for-loop? At the same time, this question becomes a feature request for xsimd:)

Related to #975

serge-sans-paille · 2023-11-22T14:47:45Z

Can you check if the code in #977 matches your performance expectation (on non - AVX512 targets, that is)?

The previous implementation relied on a raw for-loop that compiles an index sequence that was subsequently fed into a combination of select and swizzle. The new version doesn't need the latter two steps but compiles the result directly in the for-loop.

avitase · 2023-11-22T19:16:38Z

Your solution works. Thanks:) However, I have pruned your solution a bit. What do you think about #978 ?

Also provide a specialization for avx512f. Related to #975

Related to #975

* Generic, simple implementation fox xsimd::compress Related to #975 * fixup! Generic, simple implementation fox xsimd::compress * fixup! Generic, simple implementation fox xsimd::compress

Also provide a specialization for avx512f. Related to #975

serge-sans-paille · 2023-12-02T16:05:04Z

Fixed as of #981 and #977

avitase changed the title ~~How to implement an scatter for masks?~~ How to implement an expand function / scatter for masks? Nov 17, 2023

serge-sans-paille added a commit that referenced this issue Nov 21, 2023

Generic, simple implementation fox xsimd::expand

8dd0e7a

Related to #975

serge-sans-paille added a commit that referenced this issue Nov 21, 2023

Generic, simple implementation fox xsimd::expand

d5e3dbc

Related to #975

serge-sans-paille mentioned this issue Nov 22, 2023

Generic, simple implementation fox xsimd::expand #977

Merged

serge-sans-paille added a commit that referenced this issue Nov 22, 2023

Generic, simple implementation fox xsimd::expand

bbe1354

Related to #975

serge-sans-paille added a commit that referenced this issue Nov 23, 2023

Generic, simple implementation fox xsimd::expand

2c9fdd2

Also provide a specialization for avx512f. Related to #975

serge-sans-paille added a commit that referenced this issue Nov 27, 2023

Generic, simple implementation fox xsimd::compress

4222a13

Related to #975

serge-sans-paille mentioned this issue Nov 27, 2023

Generic, simple implementation fox xsimd::compress #981

Merged

serge-sans-paille added a commit that referenced this issue Nov 28, 2023

Generic, simple implementation fox xsimd::compress

fdd30c4

Related to #975

serge-sans-paille added a commit that referenced this issue Nov 28, 2023

Generic, simple implementation fox xsimd::expand

b6827c1

Also provide a specialization for avx512f. Related to #975

serge-sans-paille added a commit that referenced this issue Nov 29, 2023

Generic, simple implementation fox xsimd::expand

02c76a5

Also provide a specialization for avx512f. Related to #975

serge-sans-paille added a commit that referenced this issue Nov 29, 2023

Generic, simple implementation fox xsimd::expand

778f1c7

Also provide a specialization for avx512f. Related to #975

serge-sans-paille added a commit that referenced this issue Nov 29, 2023

Generic, simple implementation fox xsimd::expand

7a275ba

Also provide a specialization for avx512f. Related to #975

serge-sans-paille closed this as completed Dec 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to implement an expand function / scatter for masks? #975

How to implement an expand function / scatter for masks? #975

avitase commented Nov 17, 2023 •

edited

Loading

serge-sans-paille commented Nov 18, 2023

avitase commented Nov 19, 2023

serge-sans-paille commented Nov 20, 2023

avitase commented Nov 20, 2023 •

edited

Loading

serge-sans-paille commented Nov 20, 2023

avitase commented Nov 21, 2023

serge-sans-paille commented Nov 22, 2023

avitase commented Nov 22, 2023

serge-sans-paille commented Dec 2, 2023

How to implement an expand function / scatter for masks? #975

How to implement an expand function / scatter for masks? #975

Comments

avitase commented Nov 17, 2023 • edited Loading

serge-sans-paille commented Nov 18, 2023

avitase commented Nov 19, 2023

serge-sans-paille commented Nov 20, 2023

avitase commented Nov 20, 2023 • edited Loading

serge-sans-paille commented Nov 20, 2023

avitase commented Nov 21, 2023

serge-sans-paille commented Nov 22, 2023

avitase commented Nov 22, 2023

serge-sans-paille commented Dec 2, 2023

avitase commented Nov 17, 2023 •

edited

Loading

avitase commented Nov 20, 2023 •

edited

Loading