`xsimd::exp` broken on aarch64 and GCC 10.2.1 with `--fast-math` #971

maxmarsc · 2023-11-11T16:49:47Z

Hi, first of all I'd like to precise I've read the issue #548 that seems related, but it seems that the PR mentioned in it was merged ~2y ago so I think it might not be related atfer all ?

Setup

XSIMD versions tested : 11.1.0 / 11.2.0 / e9bd979

I'm cross-compiling with GCC-10.2-2020.11 for aarch64.

flags :
-O3 --ffast-math

The issue

Sometimes, not all times, the result of xsimd::exp on xsimd::batch<std::complex<float>> give the negative of the expected result.

Demo

/*
* xsimd_issue.cpp
* Created by maxmarsc, 11/11/2023
*/

#include <array>
#include <iostream>
#include <xsimd/xsimd.hpp>

namespace xs = xsimd;

int main() {
  using CpxBatch       = xs::batch<std::complex<float>>;
  constexpr auto kSize = 4;
  static_assert(kSize == CpxBatch::size);

  const std::array<std::complex<float>, kSize> input{
      -0.299318 + 2.94764I, -0.974417 + 2.98286I, -1.89003 + 2.94797I,
      -3.1558 + 2.50801I};

  // Scalar version
  std::array<std::complex<float>, kSize> scalar_result{};
  for (auto i = 0; i < kSize; ++i) {
    scalar_result[i] = std::exp(input[i]);
  }

  // XSIMD version
  std::array<std::complex<float>, kSize> xsimd_result{};
  const auto input_batch = xs::load_unaligned(input.data());
  auto rslt              = xs::exp(input_batch);
  rslt.store_unaligned(xsimd_result.data());

  // Compare results
  constexpr auto kEps = 1e-6F;
  auto err_max        = 0.F;
  for (std::size_t i = 0; i < kSize; ++i) {
    err_max = std::max(err_max, std::abs(xsimd_result[i] - scalar_result[i]));
    std::cout << i << " : " << scalar_result[i] << " | " << xsimd_result[i]
              << std::endl;
  }

  if (err_max > kEps)
    return 1;
  return 0;
}

output:

0 : (-0.727424,0.142882) | (0.727424,-0.142882)
1 : (-0.372668,0.0596564) | (0.372668,-0.0596564)
2 : (-0.148244,0.0290676) | (0.148244,-0.0290676)
3 : (-0.0343353,0.0252233) | (0.0343353,-0.0252233)

I've tested with both qemu-aarch64-static and my raspberry pi 3B+ and both gives the same erroneous results

Notes

If I remove --ffast-math the results are correct.

On my intel setup (AVX2 / GCC 11.4.0 / ffast-math) the results are correct.

Please let me know if there's anything else I can do to help 🙂 And thanks a lot for this library !

The text was updated successfully, but these errors were encountered:

maxmarsc · 2023-11-23T14:22:25Z

I finally had some time to dig a bit on this, and I tracked down the bug to the xsimd::kernel::bitofsign function.

This function performs a bitwise & operation on -0.f to retrieve the sign bit. But GCC optimize -0.F to 0.F so the sign is inversed, which poison the whole computing chain.

I haven't yet found a nice way to force the value to be -0.f. It would either require to disable ffast-math on the minuszero function (which is the one actually causing the positive zero to be loaded), but this is hard to do because it is generated through a macro.

Another way would be to disable ffast-math on the whole bitofsign function, and load minus zero manually using batch<float, A>::broadcast(-0.F) rather than calling minuszero.

I'll keep digging when I'll have time, but I'm interested if someone have a feedback on this

maxmarsc mentioned this issue Nov 24, 2023

Fixed constants::signmask for GCC when using ffast-math #980

Merged

JohanMabille closed this as completed in #980 Nov 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`xsimd::exp` broken on aarch64 and GCC 10.2.1 with `--fast-math` #971

`xsimd::exp` broken on aarch64 and GCC 10.2.1 with `--fast-math` #971

maxmarsc commented Nov 11, 2023 •

edited

Loading

maxmarsc commented Nov 23, 2023

xsimd::exp broken on aarch64 and GCC 10.2.1 with --fast-math #971

xsimd::exp broken on aarch64 and GCC 10.2.1 with --fast-math #971

Comments

maxmarsc commented Nov 11, 2023 • edited Loading

Setup

The issue

Demo

Notes

maxmarsc commented Nov 23, 2023

`xsimd::exp` broken on aarch64 and GCC 10.2.1 with `--fast-math` #971

`xsimd::exp` broken on aarch64 and GCC 10.2.1 with `--fast-math` #971

maxmarsc commented Nov 11, 2023 •

edited

Loading