-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation suggestion for complex multiplication to improve performance #91
Comments
A few thoughts:
It's reasonable to start with that everywhere, and then you can remove the clone on the last use of a given value to instead consume it. So in your code |
Wouldn't this also be an issue with the current multiplication ? ie,
I'm really not an expert on generic Rust (though I'd very much like to become one :D) so I don't feel confident giving you my input here.
I don't have any such benchmarks in Rust, but if you tell me it's relevant, I can dig through my files and adapt some old C of mine this weekend, and provide you the source so you can run the benchmark. Or I can also try to fork this repo and make the changes, but I'm really not sure how to set that up such a fork or dependency with cargo. Otherwise, I could try to figure out how to use the system clock in Rust. The simplest thing is just to:
I can do that in C for you right now if you'd like.
Thanks for the helpful advice !! Given that logic, I suppose I forgot to clone |
Not sure if this is useful, but a hopefully fairly straightforward benchmark for complex number multiplication could come from a straightforward fractal loop:
I tweaked some rust-based julia fractal code I've been fiddling with recently to spit me out some example values where it had to do lots of iteration, to get me some useful values for c and z. That seems to result in sufficient multiplications as to provide a measurement of performance. I can easily influence the performance quite drastically by deliberately breaking the Here's the benchmark result from this benchmark against current master:
|
With my Ryzen 7 3800X, I get 248 ns/iter on that benchmark with current master, and 264 ns/iter with the suggested changes. Most of that time is spent in |
Here are the two implementations on godbolt, using It doesn't look surprising at this level -- indeed showing 4 |
The current implementation for complex multiplication uses the "naive" algorithm, see
lib.rs
lines 683-692.There exists a slightly faster alternative, a variation on the Karatsuba algorithm, for complex numbers. This algorithm works using only 3 multiplications instead of 4, but 5 additions/subtractions instead of 2. On platforms where addition and multiplication cost the same, it might not be an improvement, but I've consistently seen this algorithm show improved performance on various platforms in C (though I'm not familiar with the Rust compiler, so you might want to test things with some benchmarks).
This is the algorithm's pseudocode:
source: https://en.wikipedia.org/wiki/Multiplication_algorithm#Complex_multiplication_algorithm
This would correspond, if I'm not mistaken (I'm currently learning Rust; drawing fractals, which is why I noticed this possible performance improvement; but am not sure about where to
.clone()
or not, so I put it everywhere), to the following code:The text was updated successfully, but these errors were encountered: