Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unstable AABBPruner that crashes all the time. #339

Open
sagaceilo opened this issue Nov 27, 2024 · 5 comments
Open

Unstable AABBPruner that crashes all the time. #339

sagaceilo opened this issue Nov 27, 2024 · 5 comments

Comments

@sagaceilo
Copy link

sagaceilo commented Nov 27, 2024

Library and Version

PhysX v5.4.2 (checked with 5.3.x as well - same issue)

Operating System

Windows 11

Hey!
Im experiencing a 100% crash in library internals. Usually happens in AABBPruner, from different callstacks, like addActors, removeActors, sometimes from raycasts or anything that touched pruner.

Editor_debug.exe!`anonymous namespace'::CAllocator::deallocate(void * ptr) Line 176 C++
[Inline Frame] PhysXCommon_64.dll!physx::PxReflectionAllocator::deallocate(void *) Line 228 C++
[Inline Frame] PhysXCommon_64.dll!physx::PxArray<unsigned int,physx::PxReflectionAllocator>::deallocate(void * mem) Line 572 C++
PhysXCommon_64.dll!physx::PxArray<unsigned int,physx::PxReflectionAllocator>::recreate(unsigned int capacity) Line 706 C++
[Inline Frame] PhysXCommon_64.dll!physx::PxArray<unsigned int,physx::PxReflectionAllocator>::shrink() Line 452 C++
[Inline Frame] PhysXCommon_64.dll!physx::PxArray<unsigned int,physx::PxReflectionAllocator>::reset() Line 463 C++
PhysXCommon_64.dll!physx::Gu::AABBTreeUpdateMap::initMap(unsigned int nbObjects, const physx::Gu::AABBTree & tree) Line 63 C++
PhysXCommon_64.dll!physx::Gu::AABBPruner::buildStep(bool synchronousCall) Line 553 C++
PhysX_64.dll!physx::Sq::PrunerManager::sceneQueryBuildStep(void * handle) Line 475 C++
PhysX_64.dll!physx::NpScene::sceneQueriesStaticPrunerUpdate(physx::PxBaseTask * __formal) Line 686 C++
PhysX_64.dll!physx::Cm::DelegateTaskphysx::NpScene,{physx::NpScene::sceneQueriesStaticPrunerUpdate,0}::run() Line 103 C++

My use case:

Im developing open world game, and I'm streaming sectors of "collision cache". Each collision cache have a list of shapes and actors. Shapes can be shared between actors and between sectors as well. That essentially reduces shapes count to <300 that are used for around 2K actors. So those numbers aren't crazy at all.
Side note: I used to have binary serialization of collection containing exactly the same set of data, but found that manual creation of shapes and actors have no perf penalty and sector blob is no 500kB but 20kB. But issue happened with binary serialized collections as well so that not the case.
Checked refcounts of all shapes and actors - looks correct, no leaking or double release.

Each sector schedule add or remove for physics scene. And each frame I'm limiting amount of actors added to 100 and removed/released to 50.

Im working with Checked build, no asserts or any warnings detected. Release build crashes as well.
PxSceneFlag::eREQUIRE_RW_LOCK is enabled, so everytime I'm modifying anything in scene Im using proper Scope(Write/Read)Locks.
Sectors contain mostly simple primitive shapes. Tested without TriangleMeshes for complex shapes, but no change.
Overall noting super custom or fancy.

Simulation loop

do
{
fetchResult( true )
sceneQueriesUpdate()

// collect dynamic actors transforms and update some game data with them
// do some begin frame stuff (non physics related)

fetchQueries( true )
flushSimulation()

// process scheduled tasks (add/remove/release), Move kinematic actors etc

simulate( fixed DT )

// render frame and collect all scheduled stuff for next frame. at this point physx is not touched until next frame.

}while( next frame --> )

What I tried

  • Changing PxDynamicTreeSecondaryPruner to BUCKET crashes immediately in first fetchQueries or fetchResult.
  • Changing to PxDynamicTreeSecondaryPruner::eNONE, does nothing. crashes same way (I assumed it would not use AABBPruner at all in that case).
  • PxBroadPhaseType::eABP or PxBroadPhaseType::ePABP does not change anything.
  • Limiting per frame add/remove to even smaller number.
  • Doing default CPU dispatcher. As we use custom one what that integrates with our thread pool on regular basis.
  • gflags are enabled for detecting heap stomps with mimalloc disabled to use raw system pages. Detects nothing. But mimalloc itself sometimes reports some overflow detection "buffer overflow in heap block 0x036C5BE2D000 of size 28664: write after 28664 bytes" - but thats way after memory corruption happened, so no clear place when memory stomp touched data.
  • Deliberately leaking actors (not doing release, but still removing them from scene) still akes pruner unhappy, so its unlikely that actor was released and its memory was misused.

Any suggestions ? :) Its starts to be a blocker for our company, as whole team experiences those crashes.
All help would be appreciated as m chasing this for a week now ;)

@PierreTerdiman
Copy link

Do you really need the sceneQueriesUpdate() / fetchQueries() / flushSimulation() calls ?

Usually people just do simulate() / fetchResults(). And no such crash has been reported before, so I suspect this is caused by this somewhat unusual setup.

@PierreTerdiman
Copy link

FWIW I tried adding these 3 extra calls in my test framework and didn't see any crash so far.

Can you try without eREQUIRE_RW_LOCK ? I usually don't use manual locks.

The callstack seems to indicate a crash inside the user-allocator. Could you perhaps post that code here?

Is there any possibility that you could be allocating and deallocating memory from different DLLs, and therefore possibly from different CRTs?

@sagaceilo
Copy link
Author

Today I made a breakthrough xD It was crashing due to the very same shape being attached to the actor. Maybe there should be some assert or something on the checked build.... IMHO stuff like this should be gracefully handled or at least reported as wrong API behaviour instead of randomly crashing and stomping memory internally.
I think I saw a similar issue somewhere here reported by someone.

@PierreTerdiman
Copy link

I am not following. Can you explain what you mean, "the very same shape being attached to the actor" ? I agree that regular usage of the API should not crash, and instead produce an error. But I am not sure what it is you did exactly that triggered a crash.

For example it is legal to attach the same shape to multiple actors if they are not created with the exclusive flag.

@sagaceilo
Copy link
Author

I had a pool of shapes that can be shares between actors. If all parameters are equal - size, type, material etc, then o actors can get the same shape.
So imagine, that artist b mistake duplicated a collider shape (in my case a tree trunk ;P ) and then during cooking this shape was reported to be in actor twice.
Exaclty the same shape pointer was used twice in attachShape on actor. And it was working untill You started removing actors by release. I assume for some reason shape is getting once addRef but twice release and could provoke mem stomps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants