You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Benchmarks showed ~30-70% overhead for the parallel variant with RAYON_NUM_THREADS=1. The discrepancy seems to be primarily related to rayon, since some preliminary investigation showed that replacing e.g. into_par_iter with into_iter accounts for most of the overhead. Further overhead could be removed by using atomic locks (though this requires more thought for efficiently handling the multi-threaded case).
The text was updated successfully, but these errors were encountered:
Update: Chucking all the code into a rayon::scope(|_| {} closure seems to remove a significant part of the overhead (but not all). This suggests that the switch between main thread and the rayon thread for the iterator might be part of the culprit, perhaps because the cache of the rayon thread will be "cold" compared to using the main thread all the way.
Benchmarks showed ~30-70% overhead for the parallel variant with
RAYON_NUM_THREADS=1
. The discrepancy seems to be primarily related torayon
, since some preliminary investigation showed that replacing e.g.into_par_iter
withinto_iter
accounts for most of the overhead. Further overhead could be removed by using atomic locks (though this requires more thought for efficiently handling the multi-threaded case).The text was updated successfully, but these errors were encountered: