Crash in [@ NvGlEglGetFunctions] after suspend&resume if DMABUF and THREADSAFE_GL are enabled. Fixed in Nvidia driver 545
Categories
(Core :: Graphics: WebRender, defect)
Tracking
()
People
(Reporter: max, Unassigned, NeedInfo)
References
(Blocks 1 open bug)
Details
(Keywords: crash, stalled)
Crash Data
Attachments
(2 files)
Crash report: https://crash-stats.mozilla.org/report/index/d9dfc4cf-fa65-4e51-83ca-5a8e60220828
Reason: SIGSEGV / SI_KERNEL
Top 10 frames of crashing thread:
0 libnvidia-eglcore.so.515.65.01 NvGlEglGetFunctions
1 libnvidia-eglcore.so.515.65.01 NvGlEglApiInit
2 libnvidia-eglcore.so.515.65.01 NvGlEglApiInit
3 libEGL_nvidia.so.0 NvEglwlaf47906in
4 libEGL_nvidia.so.0 NvEglwlaf47906in
5 libEGL_nvidia.so.0 <.text ELF section in libEGL_nvidia.so.515.65.01>
6 libEGL_nvidia.so.0 NvEglwlaf47906in
7 libEGL_nvidia.so.0 NvEglwlaf47906in
8 libxul.so DMABufSurfaceRGBA::ReleaseTextures widget/gtk/DMABufSurface.cpp:679
9 libxul.so mozilla::wr::RenderDMABUFTextureHost::ClearCachedResources gfx/webrender_bindings/RenderDMABUFTextureHost.cpp:72
The crash appears to be triggered if an external monitor is connected/disconnected while the computer is sleeping. I can confirm this if desired. I can also supply about 10 other crash reports with the same problem, more system information, or do other debugging steps.
This might be similar to https://bugzilla.mozilla.org/show_bug.cgi?id=1737834, but that seems to be caused by a memory leak and shows up after firefox is left running for a while.
I'm using the 515 version the nvidia drivers, but I believe that this bug was present with the 510 version as well.
Comment 1•2 years ago
|
||
The bug has a crash signature, thus the bug will be considered confirmed.
Updated•2 years ago
|
Updated•2 years ago
|
Comment 2•2 years ago
|
||
https://gitlab.gnome.org/GNOME/mutter/-/issues/2045#note_1519500
Erik Kurzinger @ekurzinger · 4 weeks ago
The original cursor leak should definitely be fixed with that driver version, but I guess it's possible there's another leak somewhere. If you run nvidia-smi does it report suspiciously high video memory usage?
Updated•2 years ago
|
Comment 3•2 years ago
|
||
Erik, to me this looks like a driver bug that causes quite a few crashes - can you have a look / do you have an idea what could be happening?
Comment 4•2 years ago
|
||
I'm fairly certain the crash is not actually happening in NvGlEglGetFunctions. There's no way for that function to be reached from glDeleteTextures, maybe whatever is generating the backtrace is getting confused? In which case it might not even be the same issue as the other bug you mentioned. It seems like any segfault in libnvidia-eglcore.so gets mistakenly attributed to that function.
That said, there does seem to be a potential driver bug lurking somewhere. The other crash was during glTexImage2D, so it's possible the root cause is related.
Is the issue specific to Wayland like the other bug? Or does it also reproduce on X11? Also, does it reproduce every time, or is it intermittent? I did try connecting an external display while the system was suspended on X11, but didn't see a crash. On Wayland, suspend seems to be have been completely broken since version 510 (I was rather disturbed to discover this).
Also, if you have a reliable repro, would you be able to check the video memory usage reported by nvidia-smi? If we do suspect video memory might be getting exhausted that may help to confirm it. Although there are some video memory allocations that wouldn't be tracked by that tool.
I have not tried this on Wayland. All of my crashes are on X. I'll try to get a set of steps to reproduce this in a few minutes. I doubt that it is a memory leak bug, since it triggers immediately on resume. I'll check the memory usage. It's a bit hard to check before the crash since it's on resume though. I can definitely check after, but by that point it's probably too late
Well now I've tried:
sleeping, disconnecting external monitor, resuming
disconnecting monitor, sleeping, connecting, resuming
sleeping, resuming with monitor connected or disconnected
Nothing seems to trigger the crash. So that must have just been a red flag. Every time I can remember, the crash happened right after I resumed, but something else must be triggering it. I'll keep playing around and see if I can get a reliable reproduction
Updated•2 years ago
|
Comment hidden (obsolete) |
Updated•2 years ago
|
Comment 8•2 years ago
|
||
This happens to me a few times a day. I also suspect it's a driver issue (using 510 version)
Let me know if I can help with anything.
Comment 9•1 year ago
|
||
Crashes when I wake up my workstation from sleep, monitors stay connected.
Driver: 525.60.11
Ubuntu 22.04.1
Nightly 110.0a1 (2022-12-18) (64-bit)
Comment hidden (obsolete) |
Comment 11•1 year ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 5 desktop browser crashes on Linux on release (startup)
:gw, could you consider increasing the severity of this top-crash bug?
For more information, please visit auto_nag documentation.
Comment 12•1 year ago
|
||
Aleino, any ideas on this one?
Comment 13•1 year ago
|
||
(In reply to Glenn Watson [:gw] from comment #12)
Aleino, any ideas on this one?
From the comments above it looks like my colleague, Erik Kurzinger, is already looking into it.
Info that would be helpful:
- What is the display connection? (DP, HDMI?)
- What's the GPU?
- Does it happen on the latest available drivers?
- X11 version.
- Some repro steps -- a more detailed description of how to trigger the crash.
Updated•1 year ago
|
Comment 15•1 year ago
|
||
I don't necessarily have anything to add to this, except I don't have an external monitor per say. My setup is an older 3770k desktop system, with 2 monitors plugged in to an Asus TUF Gaming OC Geforce 1660 Super. Running Linux Pop!_OS 22.04, with NVIDIA driver 525.85.05, and Pop!_OS Linux Kernel version: 6.1.11-76060111-generic
Monitor 1 is an LG Ultrawide (2560x1080) hooked up via an HDMI cable
Monitor 2 is a Dell FP (1600x1200) hooked up via a DVI-D cable.
When it returns from suspend, it will crash Firefox nearly all the time. The only time that I can sort of say it maybe doesn't is if FF is minimized and not on screen, but I can't confirm that that is 100% effective, nor is it much of a solution. Currently that's just an anecdotal theory.
If I can help in any way for testing, triage, let me know.
Comment 16•1 year ago
|
||
Given the recent uptick in crash reports should this be S2 and/or have an assignee?
Comment 17•1 year ago
|
||
(In reply to Paul Zühlcke [:pbz] from comment #9)
Does it occur on Wayland and X11 or only on one of them?
Can the crash be prevented by setting widget.dmabuf-webgl.enabled to false and restarting Firefox?
Updated•1 year ago
|
Updated•1 year ago
|
Reporter | ||
Comment 18•1 year ago
|
||
I have not seen this bug since I switched to Wayland. I never could get a reliable reproducible procedure. When it was crashing, it would crash on (almost?) every single resume. Then after a full restart it might not do it (but I don't think this was consistent either). Then go back to crashing on every resume. When it was crashing. I could force it to happen with certainty by a sleep/resume cycle.
Comment 19•1 year ago
|
||
Update, last night before I suspended, I had 2 open FF windows, one on each monitor. I minimized them so the screen was just empty desktop. I did this consciously to try my theory. After resuming this morning, FF crashed. Apparently it doesn't matter if minimized or not. I'd say it crashes 90%+ of the time my system suspends.
Pop!_OS is X11 to answer the above question. This has been happening for me pretty regularly for several months.
Comment 20•1 year ago
|
||
Do you have the "PreserveVideoMemoryAllocations" feature enabled for the NVIDIA driver? If unsure, you can run "cat /proc/driver/nvidia/params" and look for the corresponding line.
Comment 21•1 year ago
|
||
PreserveVideoMemoryAllocations: 0
That's the defaults for Pop(System 76)/Nvidia, whomever set that. I haven't changed anything.
What does that feature do?
Comment 22•1 year ago
|
||
It will save the contents of video memory when the system suspends and restore it on resume. Otherwise applications need to explicitly re-initialize all of their textures and stuff. But I believe Firefox does have code to do that. I only asked in case it affects the reproducibility of the crash.
Comment 23•1 year ago
|
||
(I'm just a user/tester.)
Nvidia driver 525 crashes after suspend&resume
if Dmabuf is enabled
and/or
if Firefox assumes driver thread safety and runs WebGL on a different thread.
https://nightly.mozilla.org, Gnome X11/Ubuntu 22.04 LTS/GTX1060/driver 525.85.05, PreserveVideoMemoryAllocations = 0.
STR:
- Open WebGL.
- Suspend & resume.
- Main process crash
https://webglsamples.org/aquarium/aquarium.html
- with widget.dmabuf-webgl.enabled=true (default): bp-95416af0-604b-4616-8b4c-16c9a0230302
- with widget.dmabuf-webgl.enabled=true (default) webgl.threadsafe-gl.force-disabled=true: bp-f4e20c66-c70b-478b-9ef2-1caac0230302
- with widget.dmabuf-webgl.enabled=true (default) webgl.use-canvas-render-thread=false: bp-a2856f5e-2d61-470f-a0ae-392d60230302
- with widget.dmabuf-webgl.enabled=true (default) webgl.use-canvas-render-thread=false webgl.threadsafe-gl.force-disabled=true: bp-18756762-2680-483e-9874-96a320230302
- with widget.dmabuf-webgl.enabled=false: bp-05f31420-27ca-4dad-805a-d7f060230302
- with widget.dmabuf-webgl.enabled=false webgl.threadsafe-gl.force-disabled=true: does not seem to crash
- with widget.dmabuf-webgl.enabled=false webgl.use-canvas-render-thread=false: bp-2143ae8d-ab44-421e-a093-5ac430230302
- with widget.dmabuf-webgl.enabled=false webgl.use-canvas-render-thread=false webgl.threadsafe-gl.force-disabled=true = does not seem to crash
- with widget.dmabuf-webgl.enabled=true (default): bp-5fb4eaf2-f05b-4afb-8db0-e9c4e0230302
- with widget.dmabuf-webgl.enabled=true (default) webgl.use-canvas-render-thread=false webgl.threadsafe-gl.force-disabled=true: bp-696a196c-1cf8-492b-bc17-086e20230302
- with widget.dmabuf-webgl.enabled=false webgl.use-canvas-render-thread=false: bp-8f34e537-2a3a-4652-af07-bfe8c0230302
- with widget.dmabuf-webgl.enabled=false webgl.use-canvas-render-thread=false webgl.threadsafe-gl.force-disabled=true: does not seem to crash
Here is the logic on which main process thread WebGL runs: https://searchfox.org/mozilla-central/rev/f7edb0b474a1a922f3285107620e802c6e19914d/gfx/ipc/CanvasManagerParent.cpp#52
a) if not threadsafe (bug 1739996 comment 2: so far only the case on Nouveau. webgl.threadsafe-gl.force-disabled=true) = run WebGL on RenderThread
b) if threadsafe and webgl.use-canvas-render-thread=true (bug 1778431) = run WebGL on CanvasRenderThread
c) if threadsafe = run WebGL on CompositorThread. "This appears to have performance benefits, possibly because the renderer thread is too busy"
background
- Firefox internally accelerates classic Canvas via WebGL (gfx.canvas.accelerated).
- Firefox has two Dmabuf WebGL modes
- preferred on proprietary Nvidia, blacklisted for Mesa: bug 1735929 comment 25 (EGL_MESA_image_dma_buf_export)
- Gbm
- The crash also occurs with disabled Dmabuf.
Gnome Wayland on Ubuntu 22.04 LTS: Gnome glitches (wild colors) after suspend&resume, can't really see Firefox. Will upgrade Ubuntu and re-test.
Comment 24•1 year ago
|
||
Ok, finally got a repro with a debug build of the NV driver and I think I see the problem. It does appear to be a driver bug - our memory book-keeping is getting messed up after we resume from suspend which can cause a segfault at some random point later on.
Comment 25•1 year ago
|
||
Thanks Erik. Is there anything we can do from the Firefox side to work around this? I'm assuming not, but figured I'd ask - as it doesn't seem worthwhile blocking that driver version from hw-accel for a suspend/resume problem, but it's also a relatively high crash volume.
Comment 26•1 year ago
|
||
The driver bug can be fixed with a fairly low-risk change, so I might be able to get it into the next 530 release which should be fairly soon. In terms of a work-around until then, one option would be to force the use of GLX since the bug is specific to EGL. Another option would be to enable the aforementioned feature that preserves video memory across suspend / resume, which should have the side-effect of avoiding the problem. That can be done by setting the option "NVreg_PreserveVideoMemoryAllocations=1" for the nvidia kernel module.
Darkspirit's earlier comment mentioned some other settings that appear to prevent the crash, although note that it's basically a use-after-free error and therefore somewhat non-deterministic. So it could be the case that they just work due to luck... I'm not sure. The two things mentioned above should work for certain, though.
Comment 27•1 year ago
|
||
I guess I should also say that the specific thing triggering the bug is suspending while there are textures bound to EGLImages. I'm not familiar enough with FF internals to know how it uses such textures, but if that can be avoided somehow it should also avoid the crash.
Comment 28•1 year ago
|
||
Thanks Erik. Martin, Andrew, thoughts on what might be the easiest workaround? Would it be reasonable to force GLX for this driver version?
Comment 29•1 year ago
•
|
||
From what Darkspirit indicates in comment 23, I would prefer to disable DMABUF and/or THREADSAFE_GL for NVIDIA binary driver users. Putting them on GLX implies disabling DMABUF anyways.
Right now we require >= 495.44 for DMABUF. Do we have any sense of a driver range we should consider here?
Edit: Based on comment 26, maybe it is insufficient. I think it is much preferred to switching to GLX. We are trying to get away from GLX whenever possible as it has its own threading issues.
Comment 30•1 year ago
|
||
I see it crashing in [510.47.3.0, 525.89.2.0] range. I'd say block DMABUF for 510.0 to 530.0 and see if that is sufficient.
Comment 31•1 year ago
|
||
Crash volume increased because bug 1806058 increased WebGL usage.
(Lee Salzman [:lsalzman] from bug 1777849 comment 57)
We set up downloadable Blocklist rules for 110 to prevent Linux + X11 users from enabling accelerated canvas2D.
I will now test these driver versions: https://packages.ubuntu.com/search?suite=jammy-updates&searchon=names&keywords=nvidia-driver-
- nvidia-driver-525: comment 23 = Only widget.dmabuf-webgl.enabled=false + webgl.threadsafe-gl.force-disabled=true prevented the crash after suspend&resume. If I disabled only one of them, the crash still occured.
- nvidia-driver-390: black screen after boot.
Comment 32•1 year ago
|
||
Tested https://webglsamples.org/aquarium/aquarium.html on Gnome X11, Ubuntu 22.04 LTS:
- no crash with widget.dmabuf-webgl.enabled=false + webgl.threadsafe-gl.force-disabled=true, neither with multiple windows, but fishes lose their texture on the right and regain it on the left (attached screenshot). Can be fixed with F5.
nvidia-driver-470
- bp-f0354005-3840-4a2d-b41c-3c9660230302 / Asan Nightly report #15170
- widget.dmabuf-webgl.enabled=false: Firefox frozen or bp-fc60c5ca-8d83-41ee-9b7a-a4e080230302
- webgl.threadsafe-gl.force-disabled=true: no crash (because dmabuf is not supported)
- widget.dmabuf-webgl.enabled=false + webgl.threadsafe-gl.force-disabled=true: no crash (neither with Asan Nightly)
- no dmabuf support:
MOZ_LOG="Dmabuf:5" firefox/firefox
[Child 13169: Main Thread]: D/Dmabuf We're missing DRM render device!
[Child 13169: Main Thread]: D/Dmabuf nsDMABufDevice::IsDMABufWebGLEnabled: UseDMABuf 0 mUseWebGLDmabufBackend 1 widget_dmabuf_webgl_enabled 1
nvidia-driver-495 is an alias for nvidia-driver-510
- bp-b14501ad-99ce-4b9f-8cd4-c111b0230303 / Asan Nightly report #9529
- widget.dmabuf-webgl.enabled=false: bp-5ee56d6a-1c67-4631-9002-16e150230303
- webgl.threadsafe-gl.force-disabled=true: bp-733fe6b8-bdb3-4d16-a036-928c90230303
- widget.dmabuf-webgl.enabled=false + webgl.threadsafe-gl.force-disabled=true: no crash (neither with Asan Nightly)
- dmabuf is supported
MOZ_LOG="Dmabuf:5" firefox/firefox
[Parent 3974: Main Thread]: D/Dmabuf Using DRM device /dev/dri/renderD128
[Parent 3974: Main Thread]: D/Dmabuf nsDMABufDevice::Configure()
[Parent 3974: Main Thread]: D/Dmabuf Loading DMABuf system library libgbm.so.1 ...
[Parent 3974: Main Thread]: D/Dmabuf DMABuf is enabled
[ERROR glean_core] Error setting metrics feature config: Json(Error("EOF while parsing a value", line: 1, column: 0))
[Child 4164: Main Thread]: D/Dmabuf Using DRM device /dev/dri/renderD128
[Child 4164: Main Thread]: D/Dmabuf Failed to open drm render node /dev/dri/renderD128 error Permission denied
[Child 4164: Main Thread]: D/Dmabuf nsDMABufDevice::IsDMABufWebGLEnabled: UseDMABuf 1 mUseWebGLDmabufBackend 1 widget_dmabuf_webgl_enabled 1
[Parent 3974: CanvasRenderer]: D/Dmabuf nsDMABufDevice::IsDMABufWebGLEnabled: UseDMABuf 1 mUseWebGLDmabufBackend 1 widget_dmabuf_webgl_enabled 1
[Parent 3974: CanvasRenderer]: D/Dmabuf nsDMABufDevice::IsDMABufWebGLEnabled: UseDMABuf 1 mUseWebGLDmabufBackend 1 widget_dmabuf_webgl_enabled 1
[Parent 3974: CanvasRenderer]: D/Dmabuf DMABufSurfaceRGBA::Create() from EGLImage UID = 1
[Parent 3974: CanvasRenderer]: D/Dmabuf imported size 1 x 1 format 34324241 planes 1 modifiers 3000000004fe010
[Parent 3974: CanvasRenderer]: D/Dmabuf DMABufSurfaceRGBA::Serialize() UID 1
[Parent 3974: CanvasRenderer]: D/Dmabuf DMABufSurfaceRGBA::ImportSurfaceDescriptor() UID 1 size 1 x 1
[Parent 3974: CanvasRenderer]: D/Dmabuf imported size 1 x 1 format 34324241 planes 1
[Parent 3974: CanvasRenderer]: D/Dmabuf DMABufSurfaceRGBA::CreateTexture() UID 1
[Parent 3974: CanvasRenderer]: D/Dmabuf DMABufSurfaceRGBA::ReleaseTextures() UID 1
[Parent 3974: CanvasRenderer]: D/Dmabuf DMABufSurface::ReleaseDMABuf() UID 1
[Parent 3974: CanvasRenderer]: D/Dmabuf DMABufSurfaceRGBA::ReleaseTextures() UID 1
[Parent 3974: CanvasRenderer]: D/Dmabuf DMABufSurfaceRGBA::ReleaseTextures() UID 1
[Parent 3974: CanvasRenderer]: D/Dmabuf DMABufSurface::ReleaseDMABuf() UID 1
[Parent 3974: CanvasRenderer]: D/Dmabuf DMABufSurfaceRGBA::Create() from EGLImage UID = 3
[Parent 3974: CanvasRenderer]: D/Dmabuf imported size 1024 x 1024 format 34324241 planes 1 modifiers 300000000cdb014
[Parent 3974: CanvasRenderer]: D/Dmabuf DMABufSurfaceRGBA::Serialize() UID 3
[Parent 3974: CanvasRenderer]: D/Dmabuf DMABufSurfaceRGBA::ImportSurfaceDescriptor() UID 3 size 1024 x 1024
[Parent 3974: CanvasRenderer]: D/Dmabuf imported size 1024 x 1024 format 34324241 planes 1
[Parent 3974: CanvasRenderer]: D/Dmabuf DMABufSurfaceRGBA::Serialize() UID 3
[Child 4164: Main Thread]: D/Dmabuf nsDMABufDevice::IsDMABufWebGLEnabled: UseDMABuf 1 mUseWebGLDmabufBackend 1 widget_dmabuf_webgl_enabled
nvidia-driver-515
- bp-2cac2e2d-b434-4642-8735-7a9160230303 / Asan Nightly report #11913
- widget.dmabuf-webgl.enabled=false: Firefox frozen or bp-5fbf2058-1156-4872-8f68-e99df0230303
- webgl.threadsafe-gl.force-disabled=true: bp-605e6ce1-3579-4d06-845b-078ad0230303
- widget.dmabuf-webgl.enabled=false + webgl.threadsafe-gl.force-disabled=true: no crash (neither with Asan Nightly)
- dmabuf is supported
nvidia-driver-520 is an alias for nvidia-driver-525 = comment 23
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Comment 35•1 year ago
|
||
I just want to say thank you to everyone who is helping resolve this most-heinous bug, most notably Darkspirit for his trouble shooting, and Erik Kurzinger for his work at Nvidia providing the actual fix. Can't wait for my distribution to roll out 530-series NVIDIA drivers with the actual fix baked in. Cheers!
Updated•1 year ago
|
Comment 36•1 year ago
|
||
Have anyone tried to enable GPU process? Because i have layers.gpu-process.enabled set to true and this solves problem completely.
Yes, GPu process crashes periodically, but doesnt bring whole browser down.
This is my example of same crash - https://crash-stats.mozilla.org/report/index/66b6c18c-11c4-4896-8fa4-6d0220230304
Comment 37•1 year ago
|
||
It appears our blocklisting efforts have successfully brought the crash rate down.
(In reply to V. Korn from comment #36)
Have anyone tried to enable GPU process? Because i have layers.gpu-process.enabled set to true and this solves problem completely.
Yes, GPu process crashes periodically, but doesnt bring whole browser down.This is my example of same crash - https://crash-stats.mozilla.org/report/index/66b6c18c-11c4-4896-8fa4-6d0220230304
Unfortunately the GPU process as currently implemented on Linux won't work with Wayland, so we haven't invested effort in shipping it. My understanding is to make it work with Wayland, we would need to do something similar to what we do on Android, where we proxy to the parent process at the final stages of the compositing pipeline.
Comment 38•1 year ago
|
||
(In reply to Erik Kurzinger from comment #26)
The driver bug can be fixed with a fairly low-risk change, so I might be able to get it into the next 530 release which should be fairly soon. In terms of a work-around until then, one option would be to force the use of GLX since the bug is specific to EGL. Another option would be to enable the aforementioned feature that preserves video memory across suspend / resume, which should have the side-effect of avoiding the problem. That can be done by setting the option "NVreg_PreserveVideoMemoryAllocations=1" for the nvidia kernel module.
Darkspirit's earlier comment mentioned some other settings that appear to prevent the crash, although note that it's basically a use-after-free error and therefore somewhat non-deterministic. So it could be the case that they just work due to luck... I'm not sure. The two things mentioned above should work for certain, though.
Hey Erik, Can you confirm if the fix for this bug made it into the 530 release?
https://www.nvidia.com/download/driverResults.aspx/200481/en-us/
Version: 530.41.03
Thanks
Comment 39•1 year ago
|
||
No, the fix turned out to be more complicated than I initially thought and I was unable to get it checked in in time for the release. Apologies.
Comment 40•1 year ago
|
||
Thanks Erik.
Whoever can/is responsible for bug https://bugzilla.mozilla.org/show_bug.cgi?id=1820055 may need to be adjusted as it is a workaround hard coded for NVIDIA driver releases 530 and lower.
Comment 42•1 year ago
|
||
Donald,
You're going to have re-open Bug #1820055 to alter the workaround beyond release 530, as per Erik's comment about the fix not making it into 530 so without chaning the workaround the crash will return.
Comment 43•1 year ago
|
||
Redirecting needinfo to :aosmond for a follow-up on comment 39 - comment 42
Comment 44•1 year ago
|
||
Landing a fix in bug 1824778. When we have a confirmed working driver, we can unblock.
Comment 45•1 year ago
|
||
Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.
For more information, please visit auto_nag documentation.
Comment 46•11 months ago
|
||
@aosmond so what's the state with this bug? Can you annotate it with fixed/etc if it is so?
Comment 47•11 months ago
|
||
Kelsey, this big is on hold awaiting a driver fix from Nvidia, which makes Erik Kurzinger from Nvidia the primary point of contact.
After the Nvidia fix comes in then dmabuf can be re-enabled.
Any progress, Erik?
Comment 48•11 months ago
|
||
Yep, once we hear from NVIDIA on a specific driver version with the fix, we can make a more fine tuned blocklist rule.
Comment 49•10 months ago
|
||
Sorry for the slow response. The fix will be in the 545 driver release, that'll be the next major version after 535 which went public recently.
Comment 50•7 months ago
|
||
Dropping a note that 535 remains the current distributed driver release at this time. This report will remain pending on the new 545 release, but I'm setting it to stalled
to indicate no further action can be taken on it until this external event occurs. (If there's a more appropriate keyword for that kind of scenario, please correct me)
Comment 51•6 months ago
|
||
(In reply to Erik Kurzinger from comment #49)
Sorry for the slow response. The fix will be in the 545 driver release, that'll be the next major version after 535 which went public recently.
It's released on windows, and still in beta for Linux, but 545 has released. Can you confirm if this fix made it into the 545-series of drivers, Erik?
Comment 52•6 months ago
|
||
Yes, this should be fixed in the 545 beta release.
Comment 53•6 months ago
|
||
Awesome, I guess the next step is to re-enable the code that implements DMABUF and THREADSAFE_GL with a minimum version of NVIDIA 545 driver.
Comment 54•6 months ago
|
||
Okay, let's enable this on nightly and early beta and see what shakes out. When the driver itself hits release, we can look at shipping to release as well.
Comment 55•5 months ago
|
||
Pop!_OS just updated to the NVIDIA 545 driver.
Pop!_OS is an X11 windowing system distribution.
Just tested with nightly 20231109165012 via https://packages.mozilla.org/apt
No crash when resuming from suspend anymore. Used to be 100% crash prior to driver numbering blacklist.
Tested several times and it seems good, but I assume you'll see an uptick on telemetry that I won't be privy to if others still get the crash.
Updated•5 months ago
|
Comment 56•4 months ago
|
||
I have Nvidia driver 545 and on top of that, never suspend my computer so the issue was never relevant to me, but still can't enable DMABUF. The preference is called "force-enabled", but Firefox 121.0b9 ignores me and disables the functionality. I am the owner of my computer, so why does setting "force-enabled" not force enabling the functionality?
While we await the months it takes to release of Nvidia driver 545 beyond NFB and beta, I suggest changing settings which are currently named "force-enabled" to "suggest-enabled" and adding new "force-enabled" settings which serve to actually force enablement. Alternatively, leave the "force-enabled" settings and add a "widget.dmabuf.override-knownissue-blocklist" setting. The user should get the final say in what happens on their computer.
Comment 57•4 months ago
|
||
(In reply to lexlexlex from comment #56)
I have Nvidia driver 545 and on top of that, never suspend my computer so the issue was never relevant to me, but still can't enable DMABUF. The preference is called "force-enabled", but Firefox 121.0b9 ignores me and disables the functionality. I am the owner of my computer, so why does setting "force-enabled" not force enabling the functionality?
While we await the months it takes to release of Nvidia driver 545 beyond NFB and beta, I suggest changing settings which are currently named "force-enabled" to "suggest-enabled" and adding new "force-enabled" settings which serve to actually force enablement. Alternatively, leave the "force-enabled" settings and add a "widget.dmabuf.override-knownissue-blocklist" setting. The user should get the final say in what happens on their computer.
Hi Lexlexlex,
I reviewed the code because of the issue you are experiecing, but it looks like we implemented the widget.dmabuf.force-enabled
as one would expect [1]. There are two cases [2] and [3] where we might fail at runtime to enable DMABuf despite the pref. I wonder if you are hitting one of those two cases. If you could attach your about:support
, then I would be happy to take a look and see if I figure out why it isn't working for you. Rest assured our intent is to allow the user to turn it on if they want.
[1] https://searchfox.org/mozilla-central/rev/b60fe683f005785706074b8cd8a6dcbc363936e0/gfx/thebes/gfxPlatformGtk.cpp#217
[2] https://searchfox.org/mozilla-central/rev/b60fe683f005785706074b8cd8a6dcbc363936e0/gfx/thebes/gfxPlatformGtk.cpp#224
[3] https://searchfox.org/mozilla-central/rev/b60fe683f005785706074b8cd8a6dcbc363936e0/gfx/thebes/gfxPlatformGtk.cpp#239
Comment 58•4 months ago
|
||
My about:support result explicitly states in the decision log that DMABUF is "blocklisted", with a link to this issue ticket as justification:
>default | available
>user | force_enabled | Force enabled by pref
>env | blocklisted | Blocklisted by gfxInfo | Blocklisted due to known issues: [bug 1788573](https://bugzilla.mozilla.org/show_bug.cgi?id=1788573)
Comment 59•4 months ago
|
||
When attaching the file above, it changed my comment into a monospace font and broke the markdown, making me look like I don't know how to write markdown, so here's my comment formatted as intended.
My about:support result explicitly states in the decision log that DMABUF is "blocklisted", with a link to this issue ticket as justification:
default | available
user | force_enabled | Force enabled by pref
env | blocklisted | Blocklisted by gfxInfo | Blocklisted due to known issues: bug 1788573
Comment 60•4 months ago
|
||
Maybe the UI could use some improvements given it is unclear, but the status
field, third from the top, is force_enabled
which means it should be used. You can enable logging and watch a H264 or VP9 video to prove it (MOZ_LOG="Dmabuf:5,DmabufRef:5" firefox
or similar via about:logging).
It is true there is an entry about the blocklist, but the UI is misleading. We added the full decision log so that we could understand fully why some users get something enabled or disabled. The order of precedence is runtime/*
, user/force_enabled
, env/*
, user/*
, default/*
. Since you have a user/force_enabled
entry, it takes precedence over the env/blocklisted
entry.
{
"name": "DMABUF",
"description": "DMABUF",
"status": "force_enabled",
"log": [
{
"type": "default",
"status": "available"
},
{
"type": "user",
"status": "force_enabled",
"message": "Force enabled by pref"
},
{
"type": "env",
"status": "blocklisted",
"failureId": "FEATURE_FAILURE_BUG_1788573",
"message": "Blocklisted by gfxInfo"
}
]
},
Comment 61•4 months ago
|
||
FWIW, I filed bug 1870535 for about:support being confusing.
Comment 62•4 months ago
|
||
Okay, thanks for that analysis and bug 1870535. The UI should definitely be made clearer, ideally by simply displaying the final decision in the decision log. It could have a field that says "Decision: Enabled" or something like that.
That being said, the reason I have thought it was not actually enabled is that I have been trying to figure out why HEVC (H265) hardware acceleration is disabled. There's no reason I can find. I have an Nvidia GTX 1080 Ti and all the packages I need, but it's still disabled and there's no decision log explaining it. The decision log is very nice, but I would appreciate it even further if it logged more decisions, like why exactly HEVC hardware acceleration is disabled.
Thanks for the info, and I'll stop posting since I know this is not a support forum. I wanted to give some insight into how this bug report is being used, but it seems I didn't realize what was actually happening.
Comment 63•4 months ago
|
||
Your codec support from about:support indicates we think you should be using HW decoding:
"codecSupportInfo": "H264 SW HW\nVP9 SW HW\nVP8 SW\nAV1 SW\nHEVC NONE\nTheora SW\nAAC SW\nMP3 SW\nOpus SW\nVorbis SW\nFLAC SW\nWave SW"
I'm happy to continue investigating if you are not getting hardware decoding. Next steps would be:
- Create a fresh profile with the necessary prefs flipped, visit profiler.firefox.com to install the profiler addon, and restart.
- Go to about:logging and select the "Media" preset
- Add
,Dmabuf:5,DmabufRef:5
to the end of the logging modules text box and hitSet Log Modules
. - Click the arrow next to the new profiler icon, and select the "Media" preset for profiling.
- Click on the profiler icon to start profiling.
- Visit a website which should have H264 HW decoding and play the video for 15 seconds, then stop.
- Click on the profiler icon to stop profiling.
- When it opens the new tab, click on "Upload Local Profile" and check all the boxes.
- File a new bug explaining your problem and give us the link it generated for the profile.
Comment 64•4 months ago
|
||
Oh wait, you said H265, not H264. Yeah it didn't detect support for that. Not sure why.
Comment 65•4 months ago
|
||
(Offtopic)
(In reply to lexlexlex from comment #62)
why HEVC (H265) hardware acceleration is disabled. There's no reason I can find.
IIUC:
GPU vendors don't buy codec patent licenses, the one who puts in the last missing software piece must do.
H265/HEVC isn't compiled and not utilized by Linux Firefox (bug 1857097). Mozilla would have to pay license fees to multiple H265 patent pools. (In theory, Raspberry Pi could integrate it in their Firefox build for their licensed devices.) Windows users have to buy HEVC Video Extensions to make it work.
bug 1601815 is the alternative to HEVC.
(H264/AVC: Many patents expired, and if the user didn't manually install ffmpeg, Linux Firefox downloads the OpenH264 decoder from Cisco who pay a capped fee for users' h264 licenses.)
Comment 66•3 months ago
|
||
545 has been shipping for a couple of months by now. Is it time to drop the EARLY_BETA_OR_EARLIER guard?
Updated•18 days ago
|
Description
•