We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Helgrind runs fail randomly on concurrent critnib write and read. Possibly related to the issue #886.
Examples of failed jobs: https://github.com/oneapi-src/unified-memory-framework/actions/runs/11481262621/job/31999571534 https://github.com/oneapi-src/unified-memory-framework/actions/runs/11719465368/job/32644634889
Although failures seem to occur randomly throughout different PRs from time to time, there is one branch were the issue occurres always. Here is the branch that might be useful for debugging: https://github.com/PatKamin/unified-memory-framework/tree/helgrind-bug
==5747== ---Thread-Announcement------------------------------------------ ==5747== ==5747== Thread #1 is the program's root thread ==5747== ==5747== ---------------------------------------------------------------- ==5747== ==5747== Thread #1: pthread_rwlock_{rd,rw}lock with a pthread_mutex_t* argument ==5747== at 0x4852514: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so) ==5747== by 0x15B0E7: std::__glibcxx_rwlock_wrlock(pthread_rwlock_t*) (shared_mutex:80) ==5747== by 0x15B13B: std::__shared_mutex_pthread::lock() (shared_mutex:193) ==5747== by 0x15B1E7: std::shared_timed_mutex::lock() (shared_mutex:473) ==5747== by 0x15C13F: std::lock_guard<std::shared_timed_mutex>::lock_guard(std::shared_timed_mutex&) (std_mutex.h:229) ==5747== by 0x157846: Slab::regSlabByAddr(void*, Slab&) (pool_disjoint.cpp:535) ==5747== by 0x157AD4: Slab::regSlab(Slab&) (pool_disjoint.cpp:563) ==5747== by 0x157348: Slab::Slab(Bucket&) (pool_disjoint.cpp:463) ==5747== by 0x15C[371](https://github.com/oneapi-src/unified-memory-framework/actions/runs/11481262621/job/31999571534#step:8:372): std::_MakeUniq<Slab>::__single_object std::make_unique<Slab, Bucket&>(Bucket&) (unique_ptr.h:962) ==5747== by 0x157E1C: Bucket::getAvailFullSlab(bool&) (pool_disjoint.cpp:617) ==5747== by 0x157F59: Bucket::getSlab(bool&) (pool_disjoint.cpp:631) ==5747== by 0x1590E5: DisjointPool::AllocImpl::allocate(unsigned long, bool&) (pool_disjoint.cpp:853) ==5747== { <insert_a_suppression_name_here> Helgrind:Misc obj:/usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so fun:_ZStL23__glibcxx_rwlock_wrlockP16pthread_rwlock_t fun:_ZNSt22__shared_mutex_pthread4lockEv fun:_ZNSt18shared_timed_mutex4lockEv fun:_ZNSt10lock_guardISt18shared_timed_mutexEC1ERS0_ fun:_ZN4Slab13regSlabByAddrEPvRS_ fun:_ZN4Slab7regSlabERS_ fun:_ZN4SlabC1ER6Bucket fun:_ZSt11make_uniqueI4SlabJR6BucketEENSt9_MakeUniqIT_E15__single_objectEDpOT0_ fun:_ZN6Bucket16getAvailFullSlabERb fun:_ZN6Bucket7getSlabERb fun:_ZN12DisjointPool9AllocImpl8allocateEmRb } ==5747== ---------------------------------------------------------------- ==5747== ==5747== Thread #1: pthread_rwlock_unlock with a pthread_mutex_t* argument ==5747== at 0x4852CD6: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so) ==5747== by 0x15B116: std::__glibcxx_rwlock_unlock(pthread_rwlock_t*) (shared_mutex:82) ==5747== by 0x15B16D: std::__shared_mutex_pthread::unlock() (shared_mutex:213) ==5747== by 0x15B207: std::shared_timed_mutex::unlock() (shared_mutex:475) ==5747== by 0x15C162: std::lock_guard<std::shared_timed_mutex>::~lock_guard() (std_mutex.h:235) ==5747== by 0x15787C: Slab::regSlabByAddr(void*, Slab&) (pool_disjoint.cpp:537) ==5747== by 0x157AD4: Slab::regSlab(Slab&) (pool_disjoint.cpp:563) ==5747== by 0x157348: Slab::Slab(Bucket&) (pool_disjoint.cpp:463) ==5747== by 0x15C371: std::_MakeUniq<Slab>::__single_object std::make_unique<Slab, Bucket&>(Bucket&) (unique_ptr.h:962) ==5747== by 0x157E1C: Bucket::getAvailFullSlab(bool&) (pool_disjoint.cpp:617) ==5747== by 0x157F59: Bucket::getSlab(bool&) (pool_disjoint.cpp:631) ==5747== by 0x1590E5: DisjointPool::AllocImpl::allocate(unsigned long, bool&) (pool_disjoint.cpp:853) ==5747== { <insert_a_suppression_name_here> Helgrind:Misc obj:/usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so fun:_ZStL23__glibcxx_rwlock_unlockP16pthread_rwlock_t fun:_ZNSt22__shared_mutex_pthread6unlockEv fun:_ZNSt18shared_timed_mutex6unlockEv fun:_ZNSt10lock_guardISt18shared_timed_mutexED1Ev fun:_ZN4Slab13regSlabByAddrEPvRS_ fun:_ZN4Slab7regSlabERS_ fun:_ZN4SlabC1ER6Bucket fun:_ZSt11make_uniqueI4SlabJR6BucketEENSt9_MakeUniqIT_E15__single_objectEDpOT0_ fun:_ZN6Bucket16getAvailFullSlabERb fun:_ZN6Bucket7getSlabERb fun:_ZN12DisjointPool9AllocImpl8allocateEmRb } ==5747== ---------------------------------------------------------------- ==5747== ==5747== Thread #1: pthread_rwlock_{rd,rw}lock with a pthread_mutex_t* argument ==5747== at 0x4852704: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so) ==5747== by 0x15B0B8: std::__glibcxx_rwlock_rdlock(pthread_rwlock_t*) (shared_mutex:78) ==5747== by 0x15B18F: std::__shared_mutex_pthread::lock_shared() (shared_mutex:229) ==5747== by 0x15B227: std::shared_timed_mutex::lock_shared() (shared_mutex:489) ==5747== by 0x15C6D8: std::shared_lock<std::shared_timed_mutex>::shared_lock(std::shared_timed_mutex&) (shared_mutex:727) ==5747== by 0x15965B: DisjointPool::AllocImpl::deallocate(void*, bool&) (pool_disjoint.cpp:960) ==5747== by 0x159E95: DisjointPool::free(void*) (pool_disjoint.cpp:1081) ==5747== by 0x15DB31: _ZZN3umf6detail11poolOpsBaseI12DisjointPoolEE21umf_memory_pool_ops_tvENKUlPvDpT_E5_clIJS4_EEEDaS4_S6_ (cpp_helpers.hpp:77) ==5747== by 0x15FEFA: _ZZN3umf6detail11poolOpsBaseI12DisjointPoolEE21umf_memory_pool_ops_tvENUlPvDpT_E5_4_FUNIJS4_EEEDTcldtdeLKPKS7_0EonclIJS6_EEscOS4_fp_spscOS5_fp0_EES4_S6_ (cpp_helpers.hpp:77) ==5747== by 0x162B84: umfPoolFree (memory_pool.c:171) ==5747== by 0x1549E9: free_memory(umf_memory_pool_t*, void**) (malloc_compliance_tests.cpp:40) ==5747== by 0x154F03: malloc_compliance_test(umf_memory_pool_t*) (malloc_compliance_tests.cpp:76) ==5747== { <insert_a_suppression_name_here> Helgrind:Misc obj:/usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so fun:_ZStL23__glibcxx_rwlock_rdlockP16pthread_rwlock_t fun:_ZNSt22__shared_mutex_pthread11lock_sharedEv fun:_ZNSt18shared_timed_mutex11lock_sharedEv fun:_ZNSt11shared_lockISt18shared_timed_mutexEC1ERS0_ fun:_ZN12DisjointPool9AllocImpl10deallocateEPvRb fun:_ZN12DisjointPool4freeEPv fun:_ZZN3umf6detail11poolOpsBaseI12DisjointPoolEE21umf_memory_pool_ops_tvENKUlPvDpT_E5_clIJS4_EEEDaS4_S6_ fun:_ZZN3umf6detail11poolOpsBaseI12DisjointPoolEE21umf_memory_pool_ops_tvENUlPvDpT_E5_4_FUNIJS4_EEEDTcldtdeLKPKS7_0EonclIJS6_EEscOS4_fp_spscOS5_fp0_EES4_S6_ fun:umfPoolFree fun:_ZL11free_memoryP17umf_memory_pool_tPPv fun:_Z22malloc_compliance_testP17umf_memory_pool_t } Helgrind: hg_main.c:308 (lockN_acquire_reader): Assertion 'lk->kind == LK_rdwr' failed. host stacktrace: ==5747== at 0x58026E7A: ??? (in /usr/libexec/valgrind/helgrind-amd64-linux) ==5747== by 0x58026FA7: ??? (in /usr/libexec/valgrind/helgrind-amd64-linux) ==5747== by 0x5802714B: ??? (in /usr/libexec/valgrind/helgrind-amd64-linux) ==5747== by 0x58006519: ??? (in /usr/libexec/valgrind/helgrind-amd64-linux) ==5747== by 0x5800E5CA: ??? (in /usr/libexec/valgrind/helgrind-amd64-linux) ==5747== by 0x5800EB6A: ??? (in /usr/libexec/valgrind/helgrind-amd64-linux) ==5747== by 0x58038[374](https://github.com/oneapi-src/unified-memory-framework/actions/runs/11481262621/job/31999571534#step:8:375): ??? (in /usr/libexec/valgrind/helgrind-amd64-linux) ==5747== by 0x5807F46A: ??? (in /usr/libexec/valgrind/helgrind-amd64-linux) ==5747== by 0x580C9E70: ??? (in /usr/libexec/valgrind/helgrind-amd64-linux) sched status: running_tid=1 Thread 1: status = VgTs_Runnable (lwpid 5747) ==5747== at 0x48527CF: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so) ==5747== by 0x15B0B8: std::__glibcxx_rwlock_rdlock(pthread_rwlock_t*) (shared_mutex:78) ==5747== by 0x15B18F: std::__shared_mutex_pthread::lock_shared() (shared_mutex:229) ==5747== by 0x15B227: std::shared_timed_mutex::lock_shared() (shared_mutex:489) ==5747== by 0x15C6D8: std::shared_lock<std::shared_timed_mutex>::shared_lock(std::shared_timed_mutex&) (shared_mutex:727) ==5747== by 0x15965B: DisjointPool::AllocImpl::deallocate(void*, bool&) (pool_disjoint.cpp:960) ==5747== by 0x159E95: DisjointPool::free(void*) (pool_disjoint.cpp:1081) ==5747== by 0x15DB31: _ZZN3umf6detail11poolOpsBaseI12DisjointPoolEE21umf_memory_pool_ops_tvENKUlPvDpT_E5_clIJS4_EEEDaS4_S6_ (cpp_helpers.hpp:77) ==5747== by 0x15FEFA: _ZZN3umf6detail11poolOpsBaseI12DisjointPoolEE21umf_memory_pool_ops_tvENUlPvDpT_E5_4_FUNIJS4_EEEDTcldtdeLKPKS7_0EonclIJS6_EEscOS4_fp_spscOS5_fp0_EES4_S6_ (cpp_helpers.hpp:77) ==5747== by 0x162B84: umfPoolFree (memory_pool.c:171) ==5747== by 0x1549E9: free_memory(umf_memory_pool_t*, void**) (malloc_compliance_tests.cpp:40) ==5747== by 0x154F03: malloc_compliance_test(umf_memory_pool_t*) (malloc_compliance_tests.cpp:76) ==5747== by 0x11E661: umfPoolTest_malloc_compliance_Test::TestBody() (poolFixtures.hpp:439) ==5747== by 0x1BBABE: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2638) ==5747== by 0x1B3DBC: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2674) ==5747== by 0x18D3FB: testing::Test::Run() (gtest.cc:2713) ==5747== by 0x18DE8B: testing::TestInfo::Run() (gtest.cc:2859) ==5747== by 0x18E829: testing::TestSuite::Run() (gtest.cc:3037) ==5747== by 0x19F19D: testing::internal::UnitTestImpl::RunAllTests() (gtest.cc:5967) ==5747== by 0x1BCBE1: bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2638) ==5747== by 0x1B5062: bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2674) ==5747== by 0x19D7AB: testing::UnitTest::Run() (gtest.cc:5546) ==5747== by 0x162258: RUN_ALL_TESTS() (gtest.h:2334) ==5747== by 0x1621D1: main (gtest_main.cc:64) client stack range: [0x1FFEFFA000 0x1FFF000FFF] client SP: 0x1FFEFFE6F0 valgrind stack range: [0x100328E000 0x100338DFFF] top usage: 18984 of 1048576
often, in some cases always (always, often, rare)
Most probably a proper valgrind instrumentation needs to be added to critnib.
Are you willing to submit a pull request with a proposed change? (Yes, No) No
Requested priority: (Showstopper, High, Medium, Low) Medium
The text was updated successfully, but these errors were encountered:
This will be fixed probably by #898, so probably there is no need to debug this issue.
Sorry, something went wrong.
No branches or pull requests
Helgrind runs fail randomly on concurrent critnib write and read. Possibly related to the issue #886.
Examples of failed jobs:
https://github.com/oneapi-src/unified-memory-framework/actions/runs/11481262621/job/31999571534
https://github.com/oneapi-src/unified-memory-framework/actions/runs/11719465368/job/32644634889
Environment Information
Please provide a reproduction of the bug:
Although failures seem to occur randomly throughout different PRs from time to time, there is one branch were the issue occurres always. Here is the branch that might be useful for debugging: https://github.com/PatKamin/unified-memory-framework/tree/helgrind-bug
Actual behavior:
How often bug is revealed:
often, in some cases always
(always, often, rare)
Details
Most probably a proper valgrind instrumentation needs to be added to critnib.
Additional information about Priority and Help Requested:
Are you willing to submit a pull request with a proposed change? (Yes, No) No
Requested priority: (Showstopper, High, Medium, Low) Medium
The text was updated successfully, but these errors were encountered: