`CsrParAssembler::assemble_pattern` significantly slower than `CsrAssembler::assemble_pattern` on a single thread #58

Andlon · 2023-02-24T12:41:28Z

Benchmarks showed ~30-70% overhead for the parallel variant with RAYON_NUM_THREADS=1. The discrepancy seems to be primarily related to rayon, since some preliminary investigation showed that replacing e.g. into_par_iter with into_iter accounts for most of the overhead. Further overhead could be removed by using atomic locks (though this requires more thought for efficiently handling the multi-threaded case).

The text was updated successfully, but these errors were encountered:

Andlon · 2023-02-24T12:42:49Z

Update: Chucking all the code into a rayon::scope(|_| {} closure seems to remove a significant part of the overhead (but not all). This suggests that the switch between main thread and the rayon thread for the iterator might be part of the culprit, perhaps because the cache of the rayon thread will be "cold" compared to using the main thread all the way.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`CsrParAssembler::assemble_pattern` significantly slower than `CsrAssembler::assemble_pattern` on a single thread #58

`CsrParAssembler::assemble_pattern` significantly slower than `CsrAssembler::assemble_pattern` on a single thread #58

Andlon commented Feb 24, 2023

Andlon commented Feb 24, 2023 •

edited

Loading

CsrParAssembler::assemble_pattern significantly slower than CsrAssembler::assemble_pattern on a single thread #58

CsrParAssembler::assemble_pattern significantly slower than CsrAssembler::assemble_pattern on a single thread #58

Comments

Andlon commented Feb 24, 2023

Andlon commented Feb 24, 2023 • edited Loading

`CsrParAssembler::assemble_pattern` significantly slower than `CsrAssembler::assemble_pattern` on a single thread #58

`CsrParAssembler::assemble_pattern` significantly slower than `CsrAssembler::assemble_pattern` on a single thread #58

Andlon commented Feb 24, 2023 •

edited

Loading