Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proof of Concept] Using PointNeighbors.jl instead of a vector of bonds #105

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

efaulhaber
Copy link

@efaulhaber efaulhaber commented Jun 10, 2024

Based on trixi-framework/PointNeighbors.jl#10.

There's a small speedup for small problems on my laptop on 1 thread (Apple M2 Pro):

julia> plot_benchmarks(40, 2)
Original Code
with 6400 points finished in 2.421 ms

GridNeighborhoodSearch
with 6400 points finished in 7.659 ms

NeighborListsNeighborhoodSearch
with 6400 points finished in 2.095 ms

NeighborListsNHS contiguous
with 6400 points finished in 2.126 ms

Original Code
with 23814 points finished in 10.803 ms

GridNeighborhoodSearch
with 23814 points finished in 35.649 ms

NeighborListsNeighborhoodSearch
with 23814 points finished in 9.305 ms

NeighborListsNHS contiguous
with 23814 points finished in 9.486 ms

For larger problems, there is a 2x speedup on 64 threads of a Threadripper 3990X:

julia> plot_benchmarks(100, 2)
Original Code
with 100000 points finished in 2.465 ms

GridNeighborhoodSearch
with 100000 points finished in 6.597 ms

NeighborListsNeighborhoodSearch
with 100000 points finished in 2.536 ms

NeighborListsNHS contiguous
with 100000 points finished in 2.560 ms

Original Code
with 379215 points finished in 21.291 ms

GridNeighborhoodSearch
with 379215 points finished in 24.278 ms

NeighborListsNeighborhoodSearch
with 379215 points finished in 12.852 ms

NeighborListsNHS contiguous
with 379215 points finished in 9.820 ms

No speedup on a single thread (maybe due to more cache per thread?):

julia> plot_benchmarks(100, 2)                                                                                                                                                                                       Original Code                                                                                                                                                                                                        with 100000 points finished in 73.701 ms          
                                                                                                                                                                                                                                                                                                                                                                                        GridNeighborhoodSearch                                                                                                                                                                                               with 100000 points finished in 230.978 ms         
                                                                                                                                                                                                                                                                                                                                                                                        NeighborListsNeighborhoodSearch                                                                                                                                                                                      with 100000 points finished in 90.497 ms      
                                                                                                                                                                                                                                                                                                                                                                                            NeighborListsNHS contiguous                                                                                                                                                                                          with 100000 points finished in 81.854 ms          
                                                                                                                                                                                                                                                                                                                                                                                        Original Code
with 379215 points finished in 299.921 ms

GridNeighborhoodSearch
with 379215 points finished in 993.680 ms

NeighborListsNeighborhoodSearch
with 379215 points finished in 355.863 ms

NeighborListsNHS contiguous
with 379215 points finished in 328.210 ms

@@ -20,8 +20,56 @@ function _calc_force_density!(storage::AbstractStorage, system::AbstractSystem,
params::AbstractPointParameters, each_point_idx)
storage.b_int .= 0
storage.n_active_bonds .= 0
for point_id in each_point_idx
Threads.@threads :static for point_id in each_point_idx
Copy link
Author

@efaulhaber efaulhaber Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wanted to ignore the chunks and only have one chunk, over which I run a threaded loop. The performance is probably similar, I just wanted to have more threads and therefore less cache per thread, and this was easier than looking at the chunks stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants