-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why gcem is much slower than cmath? #45
Comments
I believe there is a lot of room to optimise the runtime performance of gcem. I was able to reduce the time consumption by about 40% by simply changing the recursion in the tan operation to a loop. template<int max_depth, typename T>
constexpr
T
tan_cf_loop(const T xx)
noexcept
{
T ans = T(2*max_depth - 1);
for(int depth = max_depth - 1; depth > 0; --depth) {
ans = T(2*depth - 1) - xx / ans;
}
return ans;
}
template<typename T>
constexpr
T
tan_cf_main(const T x)
noexcept
{
return( (x > T(1.55) && x < T(1.60)) ? \
tan_series_exp(x) : // deals with a singularity at tan(pi/2)
//
x > T(1.4) ? \
x/tan_cf_loop<45>(x*x) :
x > T(1) ? \
x/tan_cf_loop<35>(x*x) :
// else
x/tan_cf_loop<25>(x*x) );
} |
And, I don't really understand why gcem uses tan(x/2) (45 iterations for the worst case) for calculating sine and cosine. Using Chebyshev polynomials to approximate sine and cosine should be a better choice. |
I have created a pull request optimised for trigonometry calculations #46 |
I have some functions in my library that need to be called at both compile-time and runtime, and cmath has varying degrees of support for constexpr on different platforms, so I chose to use gcem.
But in using it, I found that many of gcem's functions are an order of magnitude slower than cmath under O3 optimization. I know that I can write two versions that are called at compile time and at runtime, but I'm wondering why gcem is so much slower at runtime?
I've tested this under x86 linux, windows and mac, compiling with g++, msvc and apple clang respectively, and all get roughly the same results.
The text was updated successfully, but these errors were encountered: