-
Notifications
You must be signed in to change notification settings - Fork 401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix performance issue with nanstd and nanvar #2308
base: master
Are you sure you want to change the base?
Fix performance issue with nanstd and nanvar #2308
Conversation
I think the failures are due to missing overloads of I have on going work on xsimd to push these functions, I'll try to push them quickly so that we can merge this PR. |
Thanks a lot @JohanMabille ! Actually, the unittest without SIMD failed too. I will investigate it in the meanwhile. |
bc16cdd
to
622647d
Compare
622647d
to
d01be62
Compare
return nanmean<result_type>(square(cast<result_type>(sc) - std::move(mrv)), std::forward<X>(axes), es); | ||
// note: otherwise the result is wrong with 'immediate' evaluation strategy | ||
auto sc_shifted = eval(cast<result_type>(sc) - std::move(mrv)); | ||
return nanmean<result_type>(square(sc_shifted), std::forward<X>(axes), es); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am completely lost and I summarize my findings here:
Test code:
xt::xarray<double> aN = {{ nanv, nanv, 123, 3 }, { 1, 2, nanv, 3 }, { 1, 1, nanv, 3 }};
xt::xarray<double> ret2 = xt::nanvar(aN, {1});
std::cout << "Lazy: " << ret2 << std::endl;
xt::xarray<double> ret = xt::nanvar(aN, {1}, xt::evaluation_strategy::immediate);
std::cout << "Immediate: " << ret << std::endl;
- First case:
auto sc_shifted = eval(cast<result_type>(sc) - std::move(mrv));
return nanmean<result_type>(square(sc_shifted), std::forward<X>(axes), es);
Result with gcc8
:
Lazy: { 2400. , 0.666667, 0.888889}
Immediate: { 3600. , 0.666667, 0.888889}
Result with Clang7
: segfault
je 0x4198a0 ; <xt::xreducer_stepper<xt::xreducer_functors<xt::detail::nan_plus, xt::const_value<double>, xt::detail::nan_plus>, xt::xshared_expression<xt::xfunction<xt::detail::lambda_adapt<xt::square_fct>, xt::xarray_container<xt::uvector<double, std::allocator<double> >, (xt::layout_type)1, xt::svector<unsigned long, 4ul, std::allocator<unsigned long>, true>, xt::xtensor_expression_tag> const&> >, std::array<unsigned long, 1ul>, xt::reducer_options<double, std::tuple<xt::evaluation_strategy::lazy_type> > >::aggregate_impl(unsigned long, std::integral_constant<bool, false>) const+784>
shl $0x3,%rcx
neg %rax
mov %rdi,%rsi
nopw 0x0(%rax,%rax,1)
vmovsd (%rsi),%xmm0
- Second case:
return nanmean<result_type>(square(cast<result_type>(sc) - std::move(mrv)), std::forward<X>(axes), es);
Result with gcc8
:
Lazy: { 3600. , 0.666667, 0.888889}
Immediate: { 7365.388889, 4.666667, 12. }
Result with clang7
:
Lazy: { 3600. , 0.666667, 0.888889}
Immediate: { 7365.388889, 12. , 363. }
@JohanMabille @tdegeus Can you give some hint? Thank you!
Checklist
Description