Closed Bug 1857929 Opened 1 year ago Closed 11 months ago

Crash [@ /usr/lib/x86_64-linux-gnu/libvulkan_intel.so+0x220a2b]

Categories

(Core :: Graphics: WebGPU, defect, P1)

x86_64
Linux
defect

Tracking

()

VERIFIED FIXED
121 Branch
Tracking Status
firefox121 --- verified

People

(Reporter: jkratzer, Assigned: nical)

References

(Blocks 2 open bugs)

Details

(Keywords: pernosco, testcase, Whiteboard: [bugmon:bisected,confirmed,origRev=d12a09b7c773][fuzzblocker])

Crash Data

Attachments

(2 files)

Testcase found while fuzzing mozilla-central rev 461a9c98a535 (built with: --enable-debug --enable-fuzzing).

Testcase can be reproduced using the following commands:

$ pip install fuzzfetch grizzly-framework
$ python -m fuzzfetch --build 461a9c98a535 --debug --fuzzing -n firefox
$ python -m grizzly.replay ./firefox/firefox testcase.html
[@ /usr/lib/x86_64-linux-gnu/libvulkan_intel.so+0x220a2b]

    ==125047==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7f5b35e8ca2b bp 0x7f5c9c9f3c30 sp 0x7f5d0d1b5800 T125245)
    ==125047==The signal is caused by a WRITE memory access.
    ==125047==Hint: address points to the zero page.
        #0 0x7f5b35e8ca2b  (/usr/lib/x86_64-linux-gnu/libvulkan_intel.so+0x220a2b) (BuildId: 4375d01d531e49639965fec2d89d7da38684db5d)
        #1 0x7f5b3446a6c7  (/lib/x86_64-linux-gnu/libVkLayer_khronos_validation.so+0xe8e6c7) (BuildId: 4d2fac772f9b637bd3fd789ab892acecd55393e0)
        #2 0x7f5d6fcf8572 in ash::extensions::ext::debug_utils::DebugUtils::cmd_begin_debug_utils_label::hc9b99500570467e7 /third_party/rust/ash/src/extensions/ext/debug_utils.rs:71:9
        #3 0x7f5d6fcf8572 in wgpu_hal::vulkan::command::_$LT$impl$u20$wgpu_hal..CommandEncoder$LT$wgpu_hal..vulkan..Api$GT$$u20$for$u20$wgpu_hal..vulkan..CommandEncoder$GT$::begin_debug_marker::h4605aa3490e6367e /third_party/rust/wgpu-hal/src/vulkan/command.rs:628:26
        #4 0x7f5d6fc08790 in wgpu_core::command::_$LT$impl$u20$wgpu_core..global..Global$LT$G$GT$$GT$::command_encoder_push_debug_group::hd56f96a74b651bda /third_party/rust/wgpu-core/src/command/mod.rs:409:13
        #5 0x7f5d6fc08790 in wgpu_bindings::server::Global::command_encoder_action::h74e6e2292726295b /gfx/wgpu_bindings/src/server.rs:794:35
        #6 0x7f5d6fc10fbe in wgpu_server_command_encoder_action /gfx/wgpu_bindings/src/server.rs:842:5
        #7 0x7f5d69e7ec79 in mozilla::webgpu::WebGPUParent::RecvCommandEncoderAction(unsigned long, unsigned long, mozilla::ipc::ByteBuf const&) /dom/webgpu/ipc/WebGPUParent.cpp:1173:3
        #8 0x7f5d69e8a08a in mozilla::webgpu::PWebGPUParent::OnMessageReceived(IPC::Message const&) /builds/worker/workspace/obj-build/ipc/ipdl/PWebGPUParent.cpp:438:80
        #9 0x7f5d67f120fd in mozilla::gfx::PCanvasManagerParent::OnMessageReceived(IPC::Message const&) /builds/worker/workspace/obj-build/ipc/ipdl/PCanvasManagerParent.cpp:269:32
        #10 0x7f5d67489e0f in mozilla::ipc::MessageChannel::DispatchAsyncMessage(mozilla::ipc::ActorLifecycleProxy*, IPC::Message const&) /ipc/glue/MessageChannel.cpp:1800:25
        #11 0x7f5d67486b62 in mozilla::ipc::MessageChannel::DispatchMessage(mozilla::ipc::ActorLifecycleProxy*, mozilla::UniquePtr<IPC::Message, mozilla::DefaultDelete<IPC::Message>>) /ipc/glue/MessageChannel.cpp:1725:9
        #12 0x7f5d674877e2 in mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::ActorLifecycleProxy*, mozilla::ipc::MessageChannel::MessageTask&) /ipc/glue/MessageChannel.cpp:1525:3
        #13 0x7f5d6748892f in mozilla::ipc::MessageChannel::MessageTask::Run() /ipc/glue/MessageChannel.cpp:1623:14
        #14 0x7f5d667d594d in nsThread::ProcessNextEvent(bool, bool*) /xpcom/threads/nsThread.cpp:1192:16
        #15 0x7f5d667dc82d in NS_ProcessNextEvent(nsIThread*, bool) /xpcom/threads/nsThreadUtils.cpp:480:10
        #16 0x7f5d6749103e in mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) /ipc/glue/MessagePump.cpp:300:20
        #17 0x7f5d673aa431 in RunHandler /ipc/chromium/src/base/message_loop.cc:363:3
        #18 0x7f5d673aa431 in MessageLoop::Run() /ipc/chromium/src/base/message_loop.cc:345:3
        #19 0x7f5d667d0cc3 in nsThread::ThreadFunc(void*) /xpcom/threads/nsThread.cpp:370:10
        #20 0x7f5d7b22fd0f in _pt_root /nsprpub/pr/src/pthreads/ptthread.c:201:5
        #21 0x7f5d7bad0ac2 in start_thread nptl/pthread_create.c:442:8
        #22 0x7f5d7bb62a3f  misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
    
    UndefinedBehaviorSanitizer can not provide additional info.
    SUMMARY: UndefinedBehaviorSanitizer: SEGV (/usr/lib/x86_64-linux-gnu/libvulkan_intel.so+0x220a2b) (BuildId: 4375d01d531e49639965fec2d89d7da38684db5d) 
    ==125047==ABORTING
Attached file Testcase

Verified bug as reproducible on mozilla-central 20231009093538-d5fd5e481ff2.
Unable to bisect testcase (Testcase reproduces on start build!):

Start: d420f9190e2f35e314aa67ee346650f86451792c (20221010033207)
End: 461a9c98a535b9896e9d394bdfee72ce90cf7afd (20231006092133)
BuildFlags: BuildFlags(asan=False, tsan=False, debug=True, fuzzing=True, coverage=False, valgrind=False, no_opt=False, fuzzilli=False, nyx=False)

Whiteboard: [bugmon:confirm] → [bugmon:bisected,confirmed]
Assignee: nobody → nical.bugzilla
Severity: -- → S3
Priority: -- → P2
Whiteboard: [bugmon:bisected,confirmed] → [bugmon:bisected,confirmed][fuzzblocker]

This bug prevents fuzzing from making progress; however, it has low severity. It is important for fuzz blocker bugs to be addressed in a timely manner (see here why?).
:nical, could you consider increasing the severity?

For more information, please visit BugBot documentation.

Flags: needinfo?(nical.bugzilla)
Severity: S3 → S2
Flags: needinfo?(nical.bugzilla)
Depends on: 1859999
Blocks: webgpu-v1
Status: NEW → ASSIGNED
Priority: P2 → P1
Status: ASSIGNED → RESOLVED
Closed: 11 months ago
Resolution: --- → FIXED

Bug marked as FIXED but still reproduces on mozilla-central 20231106094018-925231a8fb5e. If you believe this to be incorrect, please remove the bugmon keyword to prevent further analysis.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

I just tested this using m-c 20231115-d12a09b7c773 and got the following stack trace.

==43405==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7fc35048ba2b bp 0x7fc4749fdd40 sp 0x7fc48fbfb840 T43504)
==43405==The signal is caused by a WRITE memory access.
==43405==Hint: address points to the zero page.
    #0 0x7fc35048ba2b  (/usr/lib/x86_64-linux-gnu/libvulkan_intel.so+0x220a2b) (BuildId: 4375d01d531e49639965fec2d89d7da38684db5d)
    #1 0x7fc31bbee6c7  (/lib/x86_64-linux-gnu/libVkLayer_khronos_validation.so+0xe8e6c7) (BuildId: 4d2fac772f9b637bd3fd789ab892acecd55393e0)
    #2 0x7fc5477ca1f7 in ash::extensions::ext::debug_utils::DebugUtils::cmd_begin_debug_utils_label::h4a6c70103bc1f252 /builds/worker/checkouts/gecko/third_party/rust/ash/src/extensions/ext/debug_utils.rs:71:9
    #3 0x7fc5477ca1f7 in wgpu_hal::vulkan::command::_$LT$impl$u20$wgpu_hal..CommandEncoder$LT$wgpu_hal..vulkan..Api$GT$$u20$for$u20$wgpu_hal..vulkan..CommandEncoder$GT$::begin_debug_marker::h9b7cb5f32d978f73 /builds/worker/checkouts/gecko/third_party/rust/wgpu-hal/src/vulkan/command.rs:628:26
    #4 0x7fc5476db484 in wgpu_core::command::_$LT$impl$u20$wgpu_core..global..Global$LT$G$GT$$GT$::command_encoder_push_debug_group::h0645c780d1991fd1 /builds/worker/checkouts/gecko/third_party/rust/wgpu-core/src/command/mod.rs:414:17
    #5 0x7fc5476db484 in wgpu_bindings::server::Global::command_encoder_action::h1a3a1fab9070d21c /builds/worker/checkouts/gecko/gfx/wgpu_bindings/src/server.rs:863:35
    #6 0x7fc5476e3e82 in wgpu_server_command_encoder_action /builds/worker/checkouts/gecko/gfx/wgpu_bindings/src/server.rs:911:5
    #7 0x7fc541914719 in mozilla::webgpu::WebGPUParent::RecvCommandEncoderAction(unsigned long, unsigned long, mozilla::ipc::ByteBuf const&) /builds/worker/checkouts/gecko/dom/webgpu/ipc/WebGPUParent.cpp:1289:3
    #8 0x7fc54192040a in mozilla::webgpu::PWebGPUParent::OnMessageReceived(IPC::Message const&) /builds/worker/workspace/obj-build/ipc/ipdl/PWebGPUParent.cpp:482:80
    #9 0x7fc53f990760 in mozilla::gfx::PCanvasManagerParent::OnMessageReceived(IPC::Message const&) /builds/worker/workspace/obj-build/ipc/ipdl/PCanvasManagerParent.cpp:279:32
    #10 0x7fc53eefe52f in mozilla::ipc::MessageChannel::DispatchAsyncMessage(mozilla::ipc::ActorLifecycleProxy*, IPC::Message const&) /builds/worker/checkouts/gecko/ipc/glue/MessageChannel.cpp:1813:25
    #11 0x7fc53eefb282 in mozilla::ipc::MessageChannel::DispatchMessage(mozilla::ipc::ActorLifecycleProxy*, mozilla::UniquePtr<IPC::Message, mozilla::DefaultDelete<IPC::Message>>) /builds/worker/checkouts/gecko/ipc/glue/MessageChannel.cpp:1732:9
    #12 0x7fc53eefbf02 in mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::ActorLifecycleProxy*, mozilla::ipc::MessageChannel::MessageTask&) /builds/worker/checkouts/gecko/ipc/glue/MessageChannel.cpp:1525:3
    #13 0x7fc53eefd04f in mozilla::ipc::MessageChannel::MessageTask::Run() /builds/worker/checkouts/gecko/ipc/glue/MessageChannel.cpp:1623:14
    #14 0x7fc53e2434cd in nsThread::ProcessNextEvent(bool, bool*) /builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp:1192:16
    #15 0x7fc53e24a45d in NS_ProcessNextEvent(nsIThread*, bool) /builds/worker/checkouts/gecko/xpcom/threads/nsThreadUtils.cpp:480:10
    #16 0x7fc53ef0575e in mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) /builds/worker/checkouts/gecko/ipc/glue/MessagePump.cpp:300:20
    #17 0x7fc53ee1e3c1 in RunHandler /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:363:3
    #18 0x7fc53ee1e3c1 in MessageLoop::Run() /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:345:3
    #19 0x7fc53e23e7b3 in nsThread::ThreadFunc(void*) /builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp:370:10
    #20 0x7fc551de2d0f in _pt_root /builds/worker/checkouts/gecko/nsprpub/pr/src/pthreads/ptthread.c:201:5
    #21 0x7fc552683ac2 in start_thread nptl/pthread_create.c:442:8
    #22 0x7fc552715a3f  misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

UndefinedBehaviorSanitizer can not provide additional info.
SUMMARY: UndefinedBehaviorSanitizer: SEGV (/usr/lib/x86_64-linux-gnu/libvulkan_intel.so+0x220a2b) (BuildId: 4375d01d531e49639965fec2d89d7da38684db5d) 
Keywords: pernosco-wanted
Whiteboard: [bugmon:bisected,confirmed][fuzzblocker] → [bugmon:bisected,confirmed,origRev=d12a09b7c773][fuzzblocker]

Successfully recorded a pernosco session. A link to the pernosco session will be added here shortly.

A pernosco session for this bug can be found here.

May be a manifestation of the issue described at https://gitlab.freedesktop.org/mesa/mesa/-/issues/7133, though it doesn't describe the same trigger condition.

Severity: S2 → S4

According to Vulkan spec, it's possible that it can't handle "imbalanced" calls. From the spec:

When viewed from the linear series of submissions to a single queue, the calls to vkCmdBeginDebugUtilsLabelEXT and vkCmdEndDebugUtilsLabelEXT must be matched and balanced.

Since this testcase is popping a debug group before pushing one, it may be in violation of Vulkan requirements. The theory is wobbly both because I don't have a machine that can reproduce the Bug, and because the crash is happening on the subsequent push after the "imbalanced" pop. If this theory is correct, a possible fix would be for Vulkan CommandEncoder to track which labels are "open" debug groups and only pop them if they are open.

Attachment #9363963 - Attachment description: Bug 1857929 - Set the wgpu flag to filter out labels. r=#webgpu-reviewers → hg histeditBug 1857929 - Set the wgpu flag to filter out labels. r=#webgpu-reviewers
Attachment #9363963 - Attachment description: hg histeditBug 1857929 - Set the wgpu flag to filter out labels. r=#webgpu-reviewers → Bug 1857929 - Set the wgpu flag to filter out labels. r=#webgpu-reviewers
Pushed by nsilva@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/6c001801363e Set the wgpu flag to filter out labels. r=webgpu-reviewers,ErichDonGubler

Re-submitted without the offending unused import that was causing the build to fail (see https://phabricator.services.mozilla.com/D193812?id=788033#6426752).

Flags: needinfo?(nical.bugzilla)
Pushed by egubler@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/31a5cdadc1e4 Set the wgpu flag to filter out labels. r=webgpu-reviewers,ErichDonGubler
Status: REOPENED → RESOLVED
Closed: 11 months ago11 months ago
Resolution: --- → FIXED
Target Milestone: --- → 121 Branch

Verified bug as fixed on rev mozilla-central 20231118093245-391181d97b6b.
Removing bugmon keyword as no further action possible. Please review the bug and re-add the keyword for further analysis.

Status: RESOLVED → VERIFIED
Keywords: bugmon
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: