You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
E.g. the test CopyTest.OutOfMemoryRecovery (renamed to CopyTest.DISABLED_OutOfMemoryRecovery` in #4188) runs out of memory when committing (at the time of #4188) and then corrupts the database since the node group is constructed in two parts.
Generally I think we need to be more careful about potential allocation failures and try to ensure that newly allocated objects are isolated until we're done creating them to minimize the side-effects of failure.
The issue in that test in particular is that we add a new node group object to the node group collection, but don't allocate the persistentChunkGroup field until later. If that doesn't end up being allocated due to an exception, we will handle the exception, and then when closing the database serialize the node groups, including the one that was incomplete (which currently serializes fine the first time, but has issues after that).
It's worth noting that this particular situation is not actually unrecoverable at the moment, the other test in that file still succeeds by dropping the table.
One option for further testing of this problem in general would be to repeatedly try something, e.g. a reasonable sized-copy, and set up the buffer manager to randomly fail (or fail after n allocations if we want to be more exhaustive) as if it is out of memory and make sure we can recover.
The text was updated successfully, but these errors were encountered:
E.g. the test
CopyTest.OutOfMemoryRecovery
(renamed to CopyTest.DISABLED_OutOfMemoryRecovery` in #4188) runs out of memory when committing (at the time of #4188) and then corrupts the database since the node group is constructed in two parts.Generally I think we need to be more careful about potential allocation failures and try to ensure that newly allocated objects are isolated until we're done creating them to minimize the side-effects of failure.
The issue in that test in particular is that we add a new node group object to the node group collection, but don't allocate the
persistentChunkGroup
field until later. If that doesn't end up being allocated due to an exception, we will handle the exception, and then when closing the database serialize the node groups, including the one that was incomplete (which currently serializes fine the first time, but has issues after that).It's worth noting that this particular situation is not actually unrecoverable at the moment, the other test in that file still succeeds by dropping the table.
One option for further testing of this problem in general would be to repeatedly try something, e.g. a reasonable sized-copy, and set up the buffer manager to randomly fail (or fail after
n
allocations if we want to be more exhaustive) as if it is out of memory and make sure we can recover.The text was updated successfully, but these errors were encountered: