Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible missed optimization when calling memcpy or memmove in a loop #117332

Open
ldionne opened this issue Nov 22, 2024 · 6 comments
Open

Possible missed optimization when calling memcpy or memmove in a loop #117332

ldionne opened this issue Nov 22, 2024 · 6 comments

Comments

@ldionne
Copy link
Member

ldionne commented Nov 22, 2024

I noticed that the following code did not optimize to a single memcpy, unlike I would expect:

template <class T>
void relocate_1(T *first, T *last, T *dest) {
    for ( ; first != last; ++first, ++dest) {
        std::memcpy((void*)dest, first, sizeof(T));
    }
}

I would expect this to be equivalent to roughly:

template <class T>
void relocate_2(T *first, T *last, T *dest) {
    auto n = last - first;
    std::memcpy((void*)dest, first, n);
}

Is this a problem with e.g. the lack of knowledge that the [first, last) range is all valid? Note that both GCC and Clang fail to perform this optimization.

Godbolt: https://godbolt.org/z/zzdhcKPh4

@keinflue
Copy link

The behavior is not the same. Suppose for example dest == first + 1, then the first loop will copy *first to the whole range. std::memcpy over the whole range would be UB and std::memmove would not have the same result.

@nikic
Copy link
Contributor

nikic commented Nov 22, 2024

The memcpy's do get combined if you restrict-qualify dest.

@ldionne
Copy link
Member Author

ldionne commented Nov 22, 2024

Ah ah! Thanks both, that makes sense.

I can see that this gets optimized if I __restrict the destination: https://godbolt.org/z/heb71docW

However, if I switch to memmove, I don't get the same optimization (but GCC does it): https://godbolt.org/z/K8srPchTr
Is that one a missed optimization?

@b1ackviking
Copy link

For memcpy overlapping ranges are UB, but memmove has to handle overlap:

The objects may overlap: copying takes place as if the characters were copied to a temporary character array and then the characters were copied from the array to dest.

I could not find if it is stated in the C++ standard. The above citation is from https://en.cppreference.com/w/cpp/string/byte/memmove

In C99 however, the difference is pretty clear:

void* memcpy( void *restrict dest, const void *restrict src, size_t count );
void* memmove( void* dest, const void* src, size_t count );

@keinflue
Copy link

@b1ackviking I don't think that was unknown or up for discussion for anyone in this thread? Or I do not see how this is relevant to the remaining question?

@b1ackviking
Copy link

@keinflue I thought my comment could be helpful as a possible explanation to why the compiler does not optimize memmove even in the second case, that's not obvious to me. Sorry, if it was inappropriate, didn't want to break in just to say something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants