Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xsimd::exp broken on aarch64 and GCC 10.2.1 with --fast-math #971

Closed
maxmarsc opened this issue Nov 11, 2023 · 1 comment · Fixed by #980
Closed

xsimd::exp broken on aarch64 and GCC 10.2.1 with --fast-math #971

maxmarsc opened this issue Nov 11, 2023 · 1 comment · Fixed by #980

Comments

@maxmarsc
Copy link
Contributor

maxmarsc commented Nov 11, 2023

Hi, first of all I'd like to precise I've read the issue #548 that seems related, but it seems that the PR mentioned in it was merged ~2y ago so I think it might not be related atfer all ?

Setup

XSIMD versions tested : 11.1.0 / 11.2.0 / e9bd979

I'm cross-compiling with GCC-10.2-2020.11 for aarch64.

flags :
-O3 --ffast-math

The issue

Sometimes, not all times, the result of xsimd::exp on xsimd::batch<std::complex<float>> give the negative of the expected result.

Demo

/*
* xsimd_issue.cpp
* Created by maxmarsc, 11/11/2023
*/

#include <array>
#include <iostream>
#include <xsimd/xsimd.hpp>

namespace xs = xsimd;

int main() {
  using CpxBatch       = xs::batch<std::complex<float>>;
  constexpr auto kSize = 4;
  static_assert(kSize == CpxBatch::size);

  const std::array<std::complex<float>, kSize> input{
      -0.299318 + 2.94764I, -0.974417 + 2.98286I, -1.89003 + 2.94797I,
      -3.1558 + 2.50801I};

  // Scalar version
  std::array<std::complex<float>, kSize> scalar_result{};
  for (auto i = 0; i < kSize; ++i) {
    scalar_result[i] = std::exp(input[i]);
  }

  // XSIMD version
  std::array<std::complex<float>, kSize> xsimd_result{};
  const auto input_batch = xs::load_unaligned(input.data());
  auto rslt              = xs::exp(input_batch);
  rslt.store_unaligned(xsimd_result.data());

  // Compare results
  constexpr auto kEps = 1e-6F;
  auto err_max        = 0.F;
  for (std::size_t i = 0; i < kSize; ++i) {
    err_max = std::max(err_max, std::abs(xsimd_result[i] - scalar_result[i]));
    std::cout << i << " : " << scalar_result[i] << " | " << xsimd_result[i]
              << std::endl;
  }

  if (err_max > kEps)
    return 1;
  return 0;
}

output:

0 : (-0.727424,0.142882) | (0.727424,-0.142882)
1 : (-0.372668,0.0596564) | (0.372668,-0.0596564)
2 : (-0.148244,0.0290676) | (0.148244,-0.0290676)
3 : (-0.0343353,0.0252233) | (0.0343353,-0.0252233)

I've tested with both qemu-aarch64-static and my raspberry pi 3B+ and both gives the same erroneous results

Notes

If I remove --ffast-math the results are correct.

On my intel setup (AVX2 / GCC 11.4.0 / ffast-math) the results are correct.

Please let me know if there's anything else I can do to help 🙂 And thanks a lot for this library !

@maxmarsc
Copy link
Contributor Author

I finally had some time to dig a bit on this, and I tracked down the bug to the xsimd::kernel::bitofsign function.

This function performs a bitwise & operation on -0.f to retrieve the sign bit. But GCC optimize -0.F to 0.F so the sign is inversed, which poison the whole computing chain.

I haven't yet found a nice way to force the value to be -0.f. It would either require to disable ffast-math on the minuszero function (which is the one actually causing the positive zero to be loaded), but this is hard to do because it is generated through a macro.

Another way would be to disable ffast-math on the whole bitofsign function, and load minus zero manually using batch<float, A>::broadcast(-0.F) rather than calling minuszero.

I'll keep digging when I'll have time, but I'm interested if someone have a feedback on this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant