You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
). At this point, VMNode's recovery mechanism will attempt a recovery and try again to the same effect, and on indefinitely.
It should be noted that this deadloop is not encountered when QEMU terminates after VMNodeFSM has consumed a test case, because eventually test cases will be exhausted. In this case, however, no test case is consumed yet.
We have encountered this in two scenarios.
The GUI for QEMU is having technical difficulties (such as can occur with Xming+Putty).
The QEMU image has somehow been corrupted.
Solution
One solution is to throw a special exception designating that it originated in starting the VM, and therefore recovery should not be attempted.
See moralismercatus@13c966a In essence, I added a new exception VMNoRecoveryException that is thrown from start_vm which transitions to the Terminate state instead of the Error state. In this way, VMNode will not attempt to reboot the VM. It's not a complete solution because, while the deadloop no longer occurs, for some reason, CRETE does not terminate. Another issue here is that errors don't get propagated back to Dispatch with the Terminate state. A more thorough fix is needed.
The text was updated successfully, but these errors were encountered:
Problem Statement
When QEMU fails to start, an exception is thrown (
crete-dev/lib/cluster/vm_node_fsm.cpp
Line 725 in 2124206
It should be noted that this deadloop is not encountered when QEMU terminates after VMNodeFSM has consumed a test case, because eventually test cases will be exhausted. In this case, however, no test case is consumed yet.
We have encountered this in two scenarios.
Solution
One solution is to throw a special exception designating that it originated in starting the VM, and therefore recovery should not be attempted.
See moralismercatus@13c966a In essence, I added a new exception VMNoRecoveryException that is thrown from start_vm which transitions to the Terminate state instead of the Error state. In this way, VMNode will not attempt to reboot the VM. It's not a complete solution because, while the deadloop no longer occurs, for some reason, CRETE does not terminate. Another issue here is that errors don't get propagated back to Dispatch with the Terminate state. A more thorough fix is needed.
The text was updated successfully, but these errors were encountered: