Possible missed optimization when calling memcpy or memmove in a loop #117332

ldionne · 2024-11-22T14:27:19Z

I noticed that the following code did not optimize to a single memcpy, unlike I would expect:

template <class T>
void relocate_1(T *first, T *last, T *dest) {
    for ( ; first != last; ++first, ++dest) {
        std::memcpy((void*)dest, first, sizeof(T));
    }
}

I would expect this to be equivalent to roughly:

template <class T>
void relocate_2(T *first, T *last, T *dest) {
    auto n = last - first;
    std::memcpy((void*)dest, first, n);
}

Is this a problem with e.g. the lack of knowledge that the [first, last) range is all valid? Note that both GCC and Clang fail to perform this optimization.

Godbolt: https://godbolt.org/z/zzdhcKPh4

The text was updated successfully, but these errors were encountered:

keinflue · 2024-11-22T15:19:33Z

The behavior is not the same. Suppose for example dest == first + 1, then the first loop will copy *first to the whole range. std::memcpy over the whole range would be UB and std::memmove would not have the same result.

nikic · 2024-11-22T15:28:38Z

The memcpy's do get combined if you restrict-qualify dest.

ldionne · 2024-11-22T16:07:06Z

Ah ah! Thanks both, that makes sense.

I can see that this gets optimized if I __restrict the destination: https://godbolt.org/z/heb71docW

However, if I switch to memmove, I don't get the same optimization (but GCC does it): https://godbolt.org/z/K8srPchTr
Is that one a missed optimization?

b1ackviking · 2024-11-22T18:34:08Z

For memcpy overlapping ranges are UB, but memmove has to handle overlap:

The objects may overlap: copying takes place as if the characters were copied to a temporary character array and then the characters were copied from the array to dest.

I could not find if it is stated in the C++ standard. The above citation is from https://en.cppreference.com/w/cpp/string/byte/memmove

In C99 however, the difference is pretty clear:

void* memcpy( void *restrict dest, const void *restrict src, size_t count );
void* memmove( void* dest, const void* src, size_t count );

keinflue · 2024-11-22T22:21:20Z

@b1ackviking I don't think that was unknown or up for discussion for anyone in this thread? Or I do not see how this is relevant to the remaining question?

b1ackviking · 2024-11-22T22:55:06Z

@keinflue I thought my comment could be helpful as a possible explanation to why the compiler does not optimize memmove even in the second case, that's not obvious to me. Sorry, if it was inappropriate, didn't want to break in just to say something.

ldionne added the missed-optimization label Nov 22, 2024

EugeneZelenko added the llvm:optimizations label Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible missed optimization when calling memcpy or memmove in a loop #117332

Possible missed optimization when calling memcpy or memmove in a loop #117332

ldionne commented Nov 22, 2024

keinflue commented Nov 22, 2024

nikic commented Nov 22, 2024

ldionne commented Nov 22, 2024

b1ackviking commented Nov 22, 2024

keinflue commented Nov 22, 2024

b1ackviking commented Nov 22, 2024

Possible missed optimization when calling memcpy or memmove in a loop #117332

Possible missed optimization when calling memcpy or memmove in a loop #117332

Comments

ldionne commented Nov 22, 2024

keinflue commented Nov 22, 2024

nikic commented Nov 22, 2024

ldionne commented Nov 22, 2024

b1ackviking commented Nov 22, 2024

keinflue commented Nov 22, 2024

b1ackviking commented Nov 22, 2024