Radeon VAAPI: Crash in [@ mozalloc_abort | abort | amdgpu_ctx_set_sw_reset_status]
Categories
(Core :: Graphics, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox124 | --- | disabled |
People
(Reporter: mccr8, Unassigned, NeedInfo)
References
(Blocks 1 open bug)
Details
(Keywords: crash)
Crash Data
Attachments
(1 file)
14.00 KB,
text/plain
|
Details |
Crash report: https://crash-stats.mozilla.org/report/index/8ed3292d-7783-4fae-a95c-57e310240128
MOZ_CRASH Reason: Redirecting call to abort() to mozalloc_abort
Top 10 frames of crashing thread:
0 firefox-bin MOZ_Crash mfbt/Assertions.h:301
0 firefox-bin mozalloc_abort memory/mozalloc/mozalloc_abort.cpp:35
1 firefox-bin abort memory/mozalloc/mozalloc_abort.cpp:88
2 libgallium_drv_video.so amdgpu_ctx_set_sw_reset_status /usr/src/debug/mesa/mesa-23.3.3/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c:462
3 libgallium_drv_video.so amdgpu_cs_submit_ib /usr/src/debug/mesa/mesa-23.3.3/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c:1785
4 libgallium_drv_video.so util_queue_thread_func /usr/src/debug/mesa/mesa-23.3.3/src/util/u_queue.c:309
5 libgallium_drv_video.so impl_thrd_routine /usr/src/debug/mesa/mesa-23.3.3/src/c11/impl/threads_posix.c:67
6 firefox-bin set_alt_signal_stack_and_start mozglue/interposers/pthread_create_interposer.cpp:81
7 libc.so.6 start_thread /usr/src/debug/glibc/glibc/nptl/pthread_create.c:444
8 libc.so.6 __GI___clone /usr/src/debug/glibc/glibc/sysdeps/unix/sysv/linux/x86_64/clone.S:100
The volume is low here, but it looks like we're hitting an abort inside some kind of video driver, in the RDD process, so I figured I'd file it in case it was interesting.
Comment 1•8 months ago
|
||
I am seeing crash reports that go all the way back to builds from Firefox version 120a1, but that the crashes are only recent, since December, and seem to correspond to Mesa versions 23.3.1 or later, in particular an uptick around Mesa version 23.3.1 in mid-December, and up to version 23.3.3. Mesa 23.3.4 was released only a week or so ago, so it might take time before we see potential crash reports from that, or maybe it was fixed in that version, but I see nothing in the release notes indicating something like that.
All the crash reports seem to have in common a gfx critical error in the log: "GFX: RenderThread detected a device reset in PostUpdate".
I can't really see that it was something we changed per se.
Glenn or Andrew, does this seem like anything we've seen before on either the WR or media side with weird context loss failures in the Mesa amd driver?
Updated•8 months ago
|
Comment 2•8 months ago
|
||
I don't think I've seen anything like this on the WR side before.
Updated•8 months ago
|
This happens pretty regularly for me now - at least once a day when watching youtube videos. I have submitted many crash reports so far, is there anything I can do to help?
My setup is Firefox within Flatpak (so own mesa libs, not system mesa which is Mesa 22.3.6), running on Debian 12. Hardware is Ryzen 7840/Radeon 780.
It happened again, after updating to mesa 23.3.4 (git-27405fd573) and newest stable firefox (122.0.1) within flatpak. Uploaded crash report also.
Comment 5•7 months ago
|
||
For Debian; can you please make sure that you've updated the linux-firmware to UPSTREAM. Several GFX issues in Debian are actually root caused to an older GPU firmware snapshot.
Hi Mario, thanks for the idea. I have updated my firmware with the upstream version:
[ 3.081154] amdgpu 0000:c3:00.0: firmware: direct-loading firmware amdgpu/psp_13_0_4_toc.bin
[ 3.081754] amdgpu 0000:c3:00.0: firmware: direct-loading firmware amdgpu/psp_13_0_4_ta.bin
[ 3.083180] amdgpu 0000:c3:00.0: firmware: direct-loading firmware amdgpu/dcn_3_1_4_dmcub.bin
[ 3.084664] amdgpu 0000:c3:00.0: firmware: direct-loading firmware amdgpu/gc_11_0_1_pfp.bin
[ 3.086035] amdgpu 0000:c3:00.0: firmware: direct-loading firmware amdgpu/gc_11_0_1_me.bin
[ 3.087368] amdgpu 0000:c3:00.0: firmware: direct-loading firmware amdgpu/gc_11_0_1_rlc.bin
[ 3.088397] amdgpu 0000:c3:00.0: firmware: direct-loading firmware amdgpu/gc_11_0_1_mec.bin
[ 3.090267] amdgpu 0000:c3:00.0: firmware: direct-loading firmware amdgpu/vcn_4_0_2.bin
[ 3.092629] amdgpu 0000:c3:00.0: firmware: failed to load amdgpu/gc_11_0_1_mes_2.bin (-2)
[ 3.093087] firmware_class: See https://wiki.debian.org/Firmware for information about missing firmware
[ 3.093565] amdgpu 0000:c3:00.0: firmware: failed to load amdgpu/gc_11_0_1_mes_2.bin (-2)
[ 3.094016] amdgpu 0000:c3:00.0: Direct firmware load for amdgpu/gc_11_0_1_mes_2.bin failed with error -2
[ 3.095019] amdgpu 0000:c3:00.0: firmware: direct-loading firmware amdgpu/gc_11_0_1_mes.bin
[ 3.096463] amdgpu 0000:c3:00.0: firmware: direct-loading firmware amdgpu/gc_11_0_1_mes1.bin
[ 3.099131] [drm] Loading DMUB firmware via PSP: version=0x08000500
[ 3.099205] amdgpu 0000:c3:00.0: firmware: direct-loading firmware amdgpu/gc_11_0_1_imu.bin
[ 3.100072] amdgpu 0000:c3:00.0: firmware: direct-loading firmware amdgpu/sdma_6_0_1.bin
[ 3.100227] [drm] Found VCN firmware Version ENC: 1.17 DEC: 6 VEP: 0 Revision: 10
[ 3.100239] amdgpu 0000:c3:00.0: amdgpu: Will use PSP to load VCN firmware
I will report if that changes anything.
Comment 7•7 months ago
|
||
If it's showing messages about missing firmware you haven't updated to the upstream version properly.
Ah, you're right, I forgot to run update-initramfs
- now there's no more missing firmware.
[ 3.086428] amdgpu 0000:c3:00.0: firmware: direct-loading firmware amdgpu/psp_13_0_4_toc.bin
[ 3.087017] amdgpu 0000:c3:00.0: firmware: direct-loading firmware amdgpu/psp_13_0_4_ta.bin
[ 3.088316] amdgpu 0000:c3:00.0: firmware: direct-loading firmware amdgpu/dcn_3_1_4_dmcub.bin
[ 3.090020] amdgpu 0000:c3:00.0: firmware: direct-loading firmware amdgpu/gc_11_0_1_pfp.bin
[ 3.091383] amdgpu 0000:c3:00.0: firmware: direct-loading firmware amdgpu/gc_11_0_1_me.bin
[ 3.092735] amdgpu 0000:c3:00.0: firmware: direct-loading firmware amdgpu/gc_11_0_1_rlc.bin
[ 3.093779] amdgpu 0000:c3:00.0: firmware: direct-loading firmware amdgpu/gc_11_0_1_mec.bin
[ 3.095646] amdgpu 0000:c3:00.0: firmware: direct-loading firmware amdgpu/vcn_4_0_2.bin
[ 3.098084] amdgpu 0000:c3:00.0: firmware: direct-loading firmware amdgpu/gc_11_0_1_mes_2.bin
[ 3.099375] amdgpu 0000:c3:00.0: firmware: direct-loading firmware amdgpu/gc_11_0_1_mes1.bin
[ 3.102039] [drm] Loading DMUB firmware via PSP: version=0x08003300
[ 3.102112] amdgpu 0000:c3:00.0: firmware: direct-loading firmware amdgpu/gc_11_0_1_imu.bin
[ 3.102798] amdgpu 0000:c3:00.0: firmware: direct-loading firmware amdgpu/sdma_6_0_1.bin
[ 3.102952] [drm] Found VCN firmware Version ENC: 1.19 DEC: 7 VEP: 0 Revision: 0
[ 3.102964] amdgpu 0000:c3:00.0: amdgpu: Will use PSP to load VCN firmware
Thanks for the hint! I will update here if another crash happens. Btw, I run the 6.5 kernel from debian-backports, if that's important to know.
Linux 6.5.0-0.deb12.4-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.5.10-1~bpo12+1 (2023-11-23) x86_64 GNU/Linux
So far, no more crashes. From my gut feeling, it would have crashed at least once already with the old firmware.
I will report if that changes.
Comment 10•6 months ago
|
||
Can you please file a bug with Debian to get this fixed? It's going to make people point fingers at Firefox otherwise.
Reporter | ||
Updated•6 months ago
|
Comment 11•5 months ago
|
||
This issue should be closed in Firefox, it's caused by Debian not providing updated GPU F/W.
Comment 12•4 months ago
|
||
Is this the same issue as (or related to) https://gitlab.freedesktop.org/mesa/mesa/-/issues/10851 ?
Comment 13•4 months ago
|
||
It shouldn't be. The old firmware issue is specifically a Debian problem and that's an Arch issue you linked.
Comment 14•3 months ago
|
||
gdb captured firefox crash in mesa
Comment 15•3 months ago
|
||
Not sure if the same issue but I had to disable webgl completely because of frequent crashes.
Using gdb config
set detach-on-fork off
set mi-async on
set non-stop on
set pagination off
handle SIGPIPE nostop noprint pass
handle SIGBUS nostop noprint pass
handle SIGSYS nostop noprint pass
set history save
set history size unlimited
set history remove-duplicates unlimited
show history expansion
show commands +
set filename-display absolute
I was able to capture exact call stack, attached.
Comment 16•28 days ago
|
||
The bug is linked to a topcrash signature, which matches the following criteria:
- Top 20 desktop browser crashes on beta
- Top 5 RDD process crashes on beta
- Top 5 desktop browser crashes on Linux on beta
:bhood, could you consider increasing the severity of this top-crash bug?
For more information, please visit BugBot documentation.
Comment 17•27 days ago
|
||
Firefox 130.0b7 seem to solve the craches or atleast reduced to a Warning: "g_object_get_is_valid_property: object class 'GdkX11DeviceCore' has no property named 'device-id'" and doesn't happen with same frequency, but still in some cases mostly when playing multiple vidoes at the same time..
Comment 18•27 days ago
|
||
(In reply to noreply from comment #17)
Firefox 130.0b7 seem to solve the craches or atleast reduced to a Warning: "g_object_get_is_valid_property: object class 'GdkX11DeviceCore' has no property named 'device-id'" and doesn't happen with same frequency, but still in some cases mostly when playing multiple vidoes at the same time..
Just to clarify crashes is completely gone, just warning as I mentioned, maybe couple of them on a day.
Comment 19•4 hours ago
|
||
Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.
For more information, please visit BugBot documentation.
Comment 20•3 hours ago
|
||
(In reply to BugBot [:suhaib / :marco/ :calixte] from comment #19)
Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.
For more information, please visit BugBot documentation.
.. It could be if I start reporting everything again, only testing when something have changed. Have a vague recollection that it has been like this for a while now. I will stay on stable for now.
Description
•