-
Notifications
You must be signed in to change notification settings - Fork 406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add device mask checks for event commands #8484
base: main
Are you sure you want to change the base?
Conversation
Check if the current device mask has only 1 bit set when calling vkCmd*Event commands. Implemented VUIDs: * VUID-vkCmdSetEvent-commandBuffer-01152 * VUID-vkCmdSetEvent2-commandBuffer-03826 * VUID-vkCmdWaitEvents-commandBuffer-01167 * VUID-vkCmdWaitEvents2-commandBuffer-03846 * VUID-vkCmdResetEvent-commandBuffer-01157 * VUID-vkCmdResetEvent2-commandBuffer-03833
CI Vulkan-ValidationLayers build queued with queue ID 247687. |
CI Vulkan-ValidationLayers build # 17353 running. |
CI Vulkan-ValidationLayers build # 17353 failed. |
tests/unit/sync_val_positive.cpp
Outdated
vk::EndCommandBuffer(m_command_buffer.handle()); | ||
} | ||
|
||
TEST_F(PositiveSyncVal, EventCmds2ValidDeviceMask) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is failing CI with errors like
[ VUID-VkDeviceGroupCommandBufferBeginInfo-deviceMask-00106 ] Object 0: handle = 0x29b6b553f30, type = VK_OBJECT_TYPE_COMMAND_BUFFER; | MessageID = 0x274fbde | vkBeginCommandBuffer(): pBeginInfo->pNext.deviceMask (0x3) is invalid, Physical device count is 1. The Vulkan spec states: deviceMask must be a valid device mask value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@spencer-lunarg Is this coming from a CI machine with two GPUs in it by any chance? The only logical way I can make this work is if vkEnumeratePhysicalDevices
returns two, but the physical_device_count
member of the state tracker is equal to 1 (due not having created a device group when we initialized the device).
That sounds like a real bug in the new tests. I see what would be required to fix it based on other device group tests but I wanted to confirm the setup is what I think it must be first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the machine has 1 dedicated GPU in it, but there is an integrated GPU on the CPU I am pretty sure
I unfortunately have zero experience with device groups, not sure how well things are tested, but this is some real unexplored territory... if you think there are driver bugs, we have an internal "don't run on this GPU config" internal YAML file I can update if you want for any tests (happy to do for any with your best judgement)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would probably explain it. I think the latest iteration of the test is correct, but it also looks like it hangs a CI device (Pixel?). That seems like a fairly plausible outcome of a negative test that generates invalid sync code.
The NV crash is a bit less obvious. I'm not sure why that didn't replicate locally, but I'll investigate more and come back once sure it's not an error on my side. This is definitely a less robustly exercised part of the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems like a fairly plausible outcome of a negative test that generates invalid sync code
so remember that these tests will do a if (skip == true) skip_dispatch
and so a common issue is it crashes because we are not correctly catching the VU and it gets into the driver and blows up
tests/unit/sync_val.cpp
Outdated
@@ -6560,3 +6560,136 @@ TEST_F(NegativeSyncVal, ResourceHandleIndexStability) { | |||
|
|||
m_default_queue->Wait(); | |||
} | |||
|
|||
TEST_F(NegativeSyncVal, EventCmdsInvalidDeviceMask) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so these tests belong in sync_object.cpp
Sync Object
is what we are calling anything around validating the Synchronization objects in Vulkan (fence, semaphore, etc)
Sync Val
is what we are calling to the separate optional add on for validation
https://github.com/KhronosGroup/Vulkan-ValidationLayers/blob/main/docs/synchronization_usage.md
tests/unit/sync_val_positive.cpp
Outdated
@@ -1930,3 +1930,107 @@ TEST_F(PositiveSyncVal, AtomicAccessFromTwoSubmits2) { | |||
m_errorMonitor->VerifyFound(); | |||
m_default_queue->Wait(); | |||
} | |||
|
|||
TEST_F(PositiveSyncVal, EventCmdsValidDeviceMask) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as mentioned above, move to sync_object_positive.cpp
5371351
to
5a0ccbc
Compare
CI Vulkan-ValidationLayers build queued with queue ID 250862. |
CI Vulkan-ValidationLayers build # 17396 running. |
CI Vulkan-ValidationLayers build # 17396 failed. |
The android stack trace can be seen, for the Linux NVIDIA machine here is the stack trace from the crash on
|
The LunarG CI Checkrun failed for both the Windows-NVIDIA configuration (Spencer posted the stack trace above) and for the Android GalaxyS24 configuration. The latter failure seems to be due to a bad device that we're repairing. I'm not restarting this run because of the Windows-NVIDIA failure. The next time you push a fix, the LunarG CI Checkrun should start normally. |
No description provided.