Add device mask checks for event commands #8484

r-potter · 2024-09-02T15:49:36Z

No description provided.

Check if the current device mask has only 1 bit set when calling vkCmd*Event commands. Implemented VUIDs: * VUID-vkCmdSetEvent-commandBuffer-01152 * VUID-vkCmdSetEvent2-commandBuffer-03826 * VUID-vkCmdWaitEvents-commandBuffer-01167 * VUID-vkCmdWaitEvents2-commandBuffer-03846 * VUID-vkCmdResetEvent-commandBuffer-01157 * VUID-vkCmdResetEvent2-commandBuffer-03833

ci-tester-lunarg · 2024-09-02T15:49:39Z

CI Vulkan-ValidationLayers build queued with queue ID 247687.

ci-tester-lunarg · 2024-09-02T15:50:01Z

CI Vulkan-ValidationLayers build # 17353 running.

ci-tester-lunarg · 2024-09-02T16:10:35Z

CI Vulkan-ValidationLayers build # 17353 failed.

spencer-lunarg · 2024-09-02T16:33:50Z

tests/unit/sync_val_positive.cpp

+    vk::EndCommandBuffer(m_command_buffer.handle());
+}
+
+TEST_F(PositiveSyncVal, EventCmds2ValidDeviceMask) {


this is failing CI with errors like

[ VUID-VkDeviceGroupCommandBufferBeginInfo-deviceMask-00106 ] Object 0: handle = 0x29b6b553f30, type = VK_OBJECT_TYPE_COMMAND_BUFFER; | MessageID = 0x274fbde | vkBeginCommandBuffer(): pBeginInfo->pNext.deviceMask (0x3) is invalid, Physical device count is 1. The Vulkan spec states: deviceMask must be a valid device mask value

@spencer-lunarg Is this coming from a CI machine with two GPUs in it by any chance? The only logical way I can make this work is if vkEnumeratePhysicalDevices returns two, but the physical_device_count member of the state tracker is equal to 1 (due not having created a device group when we initialized the device).

That sounds like a real bug in the new tests. I see what would be required to fix it based on other device group tests but I wanted to confirm the setup is what I think it must be first

the machine has 1 dedicated GPU in it, but there is an integrated GPU on the CPU I am pretty sure

I unfortunately have zero experience with device groups, not sure how well things are tested, but this is some real unexplored territory... if you think there are driver bugs, we have an internal "don't run on this GPU config" internal YAML file I can update if you want for any tests (happy to do for any with your best judgement)

That would probably explain it. I think the latest iteration of the test is correct, but it also looks like it hangs a CI device (Pixel?). That seems like a fairly plausible outcome of a negative test that generates invalid sync code.

The NV crash is a bit less obvious. I'm not sure why that didn't replicate locally, but I'll investigate more and come back once sure it's not an error on my side. This is definitely a less robustly exercised part of the API.

That seems like a fairly plausible outcome of a negative test that generates invalid sync code

so remember that these tests will do a if (skip == true) skip_dispatch and so a common issue is it crashes because we are not correctly catching the VU and it gets into the driver and blows up

spencer-lunarg · 2024-09-02T16:36:48Z

tests/unit/sync_val.cpp

@@ -6560,3 +6560,136 @@ TEST_F(NegativeSyncVal, ResourceHandleIndexStability) {

    m_default_queue->Wait();
 }
+
+TEST_F(NegativeSyncVal, EventCmdsInvalidDeviceMask) {


so these tests belong in sync_object.cpp

Sync Object is what we are calling anything around validating the Synchronization objects in Vulkan (fence, semaphore, etc)

Sync Val is what we are calling to the separate optional add on for validation
https://github.com/KhronosGroup/Vulkan-ValidationLayers/blob/main/docs/synchronization_usage.md

spencer-lunarg · 2024-09-02T16:37:17Z

tests/unit/sync_val_positive.cpp

@@ -1930,3 +1930,107 @@ TEST_F(PositiveSyncVal, AtomicAccessFromTwoSubmits2) {
    m_errorMonitor->VerifyFound();
    m_default_queue->Wait();
 }
+
+TEST_F(PositiveSyncVal, EventCmdsValidDeviceMask) {


as mentioned above, move to sync_object_positive.cpp

ci-tester-lunarg · 2024-09-06T12:19:37Z

CI Vulkan-ValidationLayers build queued with queue ID 250862.

ci-tester-lunarg · 2024-09-06T12:20:01Z

CI Vulkan-ValidationLayers build # 17396 running.

ci-tester-lunarg · 2024-09-06T14:08:09Z

CI Vulkan-ValidationLayers build # 17396 failed.

spencer-lunarg · 2024-09-06T15:05:05Z

CI Vulkan-ValidationLayers build # 17396 failed.

The android stack trace can be seen, for the Linux NVIDIA machine here is the stack trace from the crash on EventCmdsInvalidDeviceMask

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000079c0acbeb10e in ?? ()
   from /lib/x86_64-linux-gnu/libnvidia-glcore.so.550.78
[Current thread is 1 (Thread 0x79c0b06b4600 (LWP 3656320))]
#0  0x000079c0acbeb10e in ?? ()
   from /lib/x86_64-linux-gnu/libnvidia-glcore.so.550.78
#1  0x000079c0acb1d6e5 in ?? ()
   from /lib/x86_64-linux-gnu/libnvidia-glcore.so.550.78
#2  0x000079c0b0373da0 in ?? () from /lib/x86_64-linux-gnu/libGLX_nvidia.so.0
#3  0x000079c0b066067a in ?? () from /lib/x86_64-linux-gnu/libvulkan.so
#4  0x000079c08215c3db in vulkan_layer_chassis::CreateDevice (
    gpu=0x55bc256b6060, pCreateInfo=0x7ffd1216d5a0, pAllocator=0x0,
    pDevice=0x7ffd1216d558)
    at /home/lunarg/.jenkins/vz3/Debug64/Vulkan-ValidationLayers/layers/vulkan/generated/chassis.cpp:725
#5  0x000079c0b065dcca in ?? () from /lib/x86_64-linux-gnu/libvulkan.so
#6  0x000079c0b06613ed in ?? () from /lib/x86_64-linux-gnu/libvulkan.so
#7  0x000079c0b06776fa in vkCreateDevice ()
   from /lib/x86_64-linux-gnu/libvulkan.so
#8  0x000055bc204bf899 in vkt::Device::init (this=0x55bc2bc68160, info=...)
    at /home/lunarg/.jenkins/vz3/Debug64/Vulkan-ValidationLayers/tests/framework/binding.cpp:264
#9  0x000055bc204bf7db in vkt::Device::init (this=0x55bc2bc68160,
    extensions=std::vector of length 0, capacity 0, features=0x55bc2c43b988,
    create_device_pnext=0x7ffd1216dc00, all_queue_count=false)
    at /home/lunarg/.jenkins/vz3/Debug64/Vulkan-ValidationLayers/tests/framework/binding.cpp:258
#10 0x000055bc2048a419 in vkt::Device::Device (this=0x55bc2bc68160,
    phy=0x55bc2c311ce0, extension_names=std::vector of length 0, capacity 0,
    features=0x55bc2c43b988, create_device_pnext=0x7ffd1216dc00,
    all_queue_count=false)
    at /home/lunarg/.jenkins/vz3/Debug64/Vulkan-ValidationLayers/tests/framework/binding.h:226
#11 0x000055bc20483bf9 in VkRenderFramework::InitState (this=0x55bc2c43b2a0,
    features=0x55bc2c43b988, create_device_pnext=0x7ffd1216dc00, flags=2)
    at /home/lunarg/.jenkins/vz3/Debug64/Vulkan-ValidationLayers/tests/framework/render.cpp:648
#12 0x000055bc20d2c83c in NegativeSyncObject_EventCmdsInvalidDeviceMask_Test::TestBody (this=0x55bc2c43b2a0)
    at /home/lunarg/.jenkins/vz3/Debug64/Vulkan-ValidationLayers/tests/unit/sync_object.cpp:3695

lunarpapillo · 2024-09-06T19:44:03Z

The LunarG CI Checkrun failed for both the Windows-NVIDIA configuration (Spencer posted the stack trace above) and for the Android GalaxyS24 configuration. The latter failure seems to be due to a bad device that we're repairing.

I'm not restarting this run because of the Windows-NVIDIA failure. The next time you push a fix, the LunarG CI Checkrun should start normally.

r-potter requested a review from a team as a code owner September 2, 2024 15:49

spencer-lunarg reviewed Sep 2, 2024

View reviewed changes

tests: Added tests for the current device mask when using events

5a0ccbc

r-potter force-pushed the rpotter-cmd-reset-event branch from 5371351 to 5a0ccbc Compare September 6, 2024 12:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add device mask checks for event commands #8484

Add device mask checks for event commands #8484

r-potter commented Sep 2, 2024

ci-tester-lunarg commented Sep 2, 2024

ci-tester-lunarg commented Sep 2, 2024

ci-tester-lunarg commented Sep 2, 2024

spencer-lunarg Sep 2, 2024

r-potter Sep 6, 2024 •

edited

Loading

spencer-lunarg Sep 6, 2024

r-potter Sep 6, 2024

spencer-lunarg Sep 6, 2024

spencer-lunarg Sep 2, 2024

spencer-lunarg Sep 2, 2024

ci-tester-lunarg commented Sep 6, 2024

ci-tester-lunarg commented Sep 6, 2024

ci-tester-lunarg commented Sep 6, 2024

spencer-lunarg commented Sep 6, 2024

lunarpapillo commented Sep 6, 2024

Add device mask checks for event commands #8484

Are you sure you want to change the base?

Add device mask checks for event commands #8484

Conversation

r-potter commented Sep 2, 2024

ci-tester-lunarg commented Sep 2, 2024

ci-tester-lunarg commented Sep 2, 2024

ci-tester-lunarg commented Sep 2, 2024

spencer-lunarg Sep 2, 2024

Choose a reason for hiding this comment

r-potter Sep 6, 2024 • edited Loading

Choose a reason for hiding this comment

spencer-lunarg Sep 6, 2024

Choose a reason for hiding this comment

r-potter Sep 6, 2024

Choose a reason for hiding this comment

spencer-lunarg Sep 6, 2024

Choose a reason for hiding this comment

spencer-lunarg Sep 2, 2024

Choose a reason for hiding this comment

spencer-lunarg Sep 2, 2024

Choose a reason for hiding this comment

ci-tester-lunarg commented Sep 6, 2024

ci-tester-lunarg commented Sep 6, 2024

ci-tester-lunarg commented Sep 6, 2024

spencer-lunarg commented Sep 6, 2024

lunarpapillo commented Sep 6, 2024

r-potter Sep 6, 2024 •

edited

Loading