-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Rayon implementation filter_clamped_parallel
for Kernel filter function (formerly filter3x3
, ...)
#608
Conversation
Normally, we add parallel functions with |
That sounds like a solution to my problem. Will refactor it. |
@ripytide Rewrote it. There's quite some code duplication now with the bigger driver functions I also added the
|
At current this parallelizes each row of the image. In our discussions and experimentation in #602 we found that it may be worth simply parallelizing per pixel rather than per row as Also regarding the code duplication, would it be possible to extract the kernel code from both functions and simply call it in normal for loops for the single-threaded versions, and call it in the |
This refactors the previous commit by keeping the original versions of the filter functions as-is and adding the _parallel variants only if the rayon feature is enabled. There is some code duplication here now subject to further refactoring.
8326a53
to
a27a8d9
Compare
Good point! I didn't even consider pixel-wise operations due to a somewhat baseless concern about hypothetical cache coherency issues, i.e. I was essentially on the same page as this thought, with the known gotcha that it degrades with special image dimensions. Would you want me to try anyways or leave it like it is for the time being in order to improve on it later on? My thought here is that it is already much faster than the non-parallelized version, and with there (now) being the sequential and parallel versions available at the same time, caller code can make informed decisions about which one to prefer when. I'm fine with trying more (and will), but I'm not exactly a Rayon expert. 😅 Regardless of the parallelization, I'll definitely have a look at refactoring the code duplication away. ✌🏼 |
As you say having a parallel implementation is much more important than making the implementation as fast as possible as that can always be improved in the future. So I'd say just make the code as readable and de-duplicated as you can and don't worry too much about micro-benchmarking. |
# Conflicts: # src/filter/mod.rs # src/filter/sharpen.rs # src/gradients.rs # src/seam_carving.rs
6855ae7
to
a1cadb1
Compare
a1cadb1
to
c6a6f5f
Compare
c6a6f5f
to
5167cac
Compare
filter_clamped_parallel
for Kernel filter function (formerly filter3x3
, ...)
5b651cb
to
5e5550c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall 👍
I'm going to take a shot at re-writing filter()
in such a way that the inner kernel bit is it's own separate function such that it can be called from the non-parallel and the parallel version.
Circling back to the earlier comments, I did try with a pixel-wise parallelization via a naive |
Okay I've finished the re-write from scratch commit here using an inner per-pixel filter function and according to my testing the new version is [1.02x faster on sequential, 1.2x faster on parallel] which is within margin of error. It's also de-duplicated as discussed. Do you want to pull that commit onto this PR or me to do a separate PR after this one is merged? |
@ripytide Cool! I merged it in here. |
Thanks for this. The new functionality looks good. However, I'm not keen on reducing the flexibility of the existing |
What do you mean by reducing the flexibility of Well except for |
According to my benchmarking the non-parallel version is within noise-margin of the speed of the previous implementation. Here is the raw benchmark comparison output:
|
Yes, I meant this change to the function signature. I don’t feel too strongly - it's probably a pretty niche use case and if we do break anyone’s code I guess they’ll let us know. I’ll merge this after the merge conflict is resolved. |
I've rebased this in #642 |
Thank you! Should we close this PR here then? |
Yeah can do |
Closing in favor of #642 |
This adds a parallelized implementation for the
filter3x3
filter_clamped
function implementation.