Open Bug 1788573 Opened 2 years ago Updated 9 days ago

Crash in [@ NvGlEglGetFunctions] after suspend&resume if DMABUF and THREADSAFE_GL are enabled. Fixed in Nvidia driver 545

Categories

(Core :: Graphics: WebRender, defect)

Firefox 106
x86_64
Linux
defect

Tracking

()

People

(Reporter: max, Unassigned, NeedInfo)

References

(Blocks 1 open bug)

Details

(Keywords: crash, stalled)

Crash Data

Attachments

(2 files)

Crash report: https://crash-stats.mozilla.org/report/index/d9dfc4cf-fa65-4e51-83ca-5a8e60220828

Reason: SIGSEGV / SI_KERNEL

Top 10 frames of crashing thread:

0 libnvidia-eglcore.so.515.65.01 NvGlEglGetFunctions 
1 libnvidia-eglcore.so.515.65.01 NvGlEglApiInit 
2 libnvidia-eglcore.so.515.65.01 NvGlEglApiInit 
3 libEGL_nvidia.so.0 NvEglwlaf47906in 
4 libEGL_nvidia.so.0 NvEglwlaf47906in 
5 libEGL_nvidia.so.0 <.text ELF section in libEGL_nvidia.so.515.65.01> 
6 libEGL_nvidia.so.0 NvEglwlaf47906in 
7 libEGL_nvidia.so.0 NvEglwlaf47906in 
8 libxul.so DMABufSurfaceRGBA::ReleaseTextures widget/gtk/DMABufSurface.cpp:679
9 libxul.so mozilla::wr::RenderDMABUFTextureHost::ClearCachedResources gfx/webrender_bindings/RenderDMABUFTextureHost.cpp:72

The crash appears to be triggered if an external monitor is connected/disconnected while the computer is sleeping. I can confirm this if desired. I can also supply about 10 other crash reports with the same problem, more system information, or do other debugging steps.

This might be similar to https://bugzilla.mozilla.org/show_bug.cgi?id=1737834, but that seems to be caused by a memory leak and shows up after firefox is left running for a while.

I'm using the 515 version the nvidia drivers, but I believe that this bug was present with the 510 version as well.

The bug has a crash signature, thus the bug will be considered confirmed.

Status: UNCONFIRMED → NEW
Ever confirmed: true
Component: General → Graphics: WebRender
Product: Firefox → Core
Severity: -- → S3
Flags: needinfo?(stransky)

https://gitlab.gnome.org/GNOME/mutter/-/issues/2045#note_1519500

Erik Kurzinger @ekurzinger · 4 weeks ago
The original cursor leak should definitely be fixed with that driver version, but I guess it's possible there's another leak somewhere. If you run nvidia-smi does it report suspiciously high video memory usage?

Blocks: wr-nv-linux
Keywords: crash
Hardware: ARM64 → x86_64
See Also: → 1737834
Summary: Crash in [@ NvGlEglGetFunctions] → Crash in [@ NvGlEglGetFunctions] apparently caused by connecting/disconnecting external monitor while the computer is sleeping
See Also: → 1779093

Erik, to me this looks like a driver bug that causes quite a few crashes - can you have a look / do you have an idea what could be happening?

Flags: needinfo?(ekurzinger)

I'm fairly certain the crash is not actually happening in NvGlEglGetFunctions. There's no way for that function to be reached from glDeleteTextures, maybe whatever is generating the backtrace is getting confused? In which case it might not even be the same issue as the other bug you mentioned. It seems like any segfault in libnvidia-eglcore.so gets mistakenly attributed to that function.

That said, there does seem to be a potential driver bug lurking somewhere. The other crash was during glTexImage2D, so it's possible the root cause is related.

Is the issue specific to Wayland like the other bug? Or does it also reproduce on X11? Also, does it reproduce every time, or is it intermittent? I did try connecting an external display while the system was suspended on X11, but didn't see a crash. On Wayland, suspend seems to be have been completely broken since version 510 (I was rather disturbed to discover this).

Also, if you have a reliable repro, would you be able to check the video memory usage reported by nvidia-smi? If we do suspect video memory might be getting exhausted that may help to confirm it. Although there are some video memory allocations that wouldn't be tracked by that tool.

Flags: needinfo?(ekurzinger)

I have not tried this on Wayland. All of my crashes are on X. I'll try to get a set of steps to reproduce this in a few minutes. I doubt that it is a memory leak bug, since it triggers immediately on resume. I'll check the memory usage. It's a bit hard to check before the crash since it's on resume though. I can definitely check after, but by that point it's probably too late

Well now I've tried:

sleeping, disconnecting external monitor, resuming
disconnecting monitor, sleeping, connecting, resuming
sleeping, resuming with monitor connected or disconnected

Nothing seems to trigger the crash. So that must have just been a red flag. Every time I can remember, the crash happened right after I resumed, but something else must be triggering it. I'll keep playing around and see if I can get a reliable reproduction

Flags: needinfo?(stransky)
Flags: needinfo?(gwatson)

This happens to me a few times a day. I also suspect it's a driver issue (using 510 version)
Let me know if I can help with anything.

See Also: → 1801892

Crashes when I wake up my workstation from sleep, monitors stay connected.
Driver: 525.60.11
Ubuntu 22.04.1
Nightly 110.0a1 (2022-12-18) (64-bit)

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 5 desktop browser crashes on Linux on release (startup)

:gw, could you consider increasing the severity of this top-crash bug?

For more information, please visit auto_nag documentation.

Flags: needinfo?(gwatson)

Aleino, any ideas on this one?

Flags: needinfo?(gwatson) → needinfo?(aleino)

(In reply to Glenn Watson [:gw] from comment #12)

Aleino, any ideas on this one?

From the comments above it looks like my colleague, Erik Kurzinger, is already looking into it.

Info that would be helpful:

  1. What is the display connection? (DP, HDMI?)
  2. What's the GPU?
  3. Does it happen on the latest available drivers?
  4. X11 version.
  5. Some repro steps -- a more detailed description of how to trigger the crash.
Flags: needinfo?(aleino)
Duplicate of this bug: 1818077
Summary: Crash in [@ NvGlEglGetFunctions] apparently caused by connecting/disconnecting external monitor while the computer is sleeping → Crash in [@ NvGlEglGetFunctions] after suspend&resume

I don't necessarily have anything to add to this, except I don't have an external monitor per say. My setup is an older 3770k desktop system, with 2 monitors plugged in to an Asus TUF Gaming OC Geforce 1660 Super. Running Linux Pop!_OS 22.04, with NVIDIA driver 525.85.05, and Pop!_OS Linux Kernel version: 6.1.11-76060111-generic

Monitor 1 is an LG Ultrawide (2560x1080) hooked up via an HDMI cable
Monitor 2 is a Dell FP (1600x1200) hooked up via a DVI-D cable.

When it returns from suspend, it will crash Firefox nearly all the time. The only time that I can sort of say it maybe doesn't is if FF is minimized and not on screen, but I can't confirm that that is 100% effective, nor is it much of a solution. Currently that's just an anecdotal theory.

If I can help in any way for testing, triage, let me know.

Given the recent uptick in crash reports should this be S2 and/or have an assignee?

Flags: needinfo?(gwatson)

(In reply to Paul Zühlcke [:pbz] from comment #9)
Does it occur on Wayland and X11 or only on one of them?
Can the crash be prevented by setting widget.dmabuf-webgl.enabled to false and restarting Firefox?

Flags: needinfo?(pbz)
Severity: S3 → S2
Flags: needinfo?(gwatson)

I have not seen this bug since I switched to Wayland. I never could get a reliable reproducible procedure. When it was crashing, it would crash on (almost?) every single resume. Then after a full restart it might not do it (but I don't think this was consistent either). Then go back to crashing on every resume. When it was crashing. I could force it to happen with certainty by a sleep/resume cycle.

Update, last night before I suspended, I had 2 open FF windows, one on each monitor. I minimized them so the screen was just empty desktop. I did this consciously to try my theory. After resuming this morning, FF crashed. Apparently it doesn't matter if minimized or not. I'd say it crashes 90%+ of the time my system suspends.

Pop!_OS is X11 to answer the above question. This has been happening for me pretty regularly for several months.

Do you have the "PreserveVideoMemoryAllocations" feature enabled for the NVIDIA driver? If unsure, you can run "cat /proc/driver/nvidia/params" and look for the corresponding line.

PreserveVideoMemoryAllocations: 0

That's the defaults for Pop(System 76)/Nvidia, whomever set that. I haven't changed anything.

What does that feature do?

It will save the contents of video memory when the system suspends and restore it on resume. Otherwise applications need to explicitly re-initialize all of their textures and stuff. But I believe Firefox does have code to do that. I only asked in case it affects the reproducibility of the crash.

(I'm just a user/tester.)

Nvidia driver 525 crashes after suspend&resume
if Dmabuf is enabled
and/or
if Firefox assumes driver thread safety and runs WebGL on a different thread.

https://nightly.mozilla.org, Gnome X11/Ubuntu 22.04 LTS/GTX1060/driver 525.85.05, PreserveVideoMemoryAllocations = 0.
STR:

  1. Open WebGL.
  2. Suspend & resume.
  3. Main process crash

https://webglsamples.org/aquarium/aquarium.html

https://yari-demos.prod.mdn.mozit.cloud/en-US/docs/Web/API/Canvas_API/Tutorial/Basic_animations/_sample_.an_animated_solar_system.html

Here is the logic on which main process thread WebGL runs: https://searchfox.org/mozilla-central/rev/f7edb0b474a1a922f3285107620e802c6e19914d/gfx/ipc/CanvasManagerParent.cpp#52
a) if not threadsafe (bug 1739996 comment 2: so far only the case on Nouveau. webgl.threadsafe-gl.force-disabled=true) = run WebGL on RenderThread
b) if threadsafe and webgl.use-canvas-render-thread=true (bug 1778431) = run WebGL on CanvasRenderThread
c) if threadsafe = run WebGL on CompositorThread. "This appears to have performance benefits, possibly because the renderer thread is too busy"

background

  • Firefox internally accelerates classic Canvas via WebGL (gfx.canvas.accelerated).
  • Firefox has two Dmabuf WebGL modes
    • preferred on proprietary Nvidia, blacklisted for Mesa: bug 1735929 comment 25 (EGL_MESA_image_dma_buf_export)
    • Gbm
  • The crash also occurs with disabled Dmabuf.

Gnome Wayland on Ubuntu 22.04 LTS: Gnome glitches (wild colors) after suspend&resume, can't really see Firefox. Will upgrade Ubuntu and re-test.

Flags: needinfo?(pbz)

Ok, finally got a repro with a debug build of the NV driver and I think I see the problem. It does appear to be a driver bug - our memory book-keeping is getting messed up after we resume from suspend which can cause a segfault at some random point later on.

Thanks Erik. Is there anything we can do from the Firefox side to work around this? I'm assuming not, but figured I'd ask - as it doesn't seem worthwhile blocking that driver version from hw-accel for a suspend/resume problem, but it's also a relatively high crash volume.

Flags: needinfo?(ekurzinger)

The driver bug can be fixed with a fairly low-risk change, so I might be able to get it into the next 530 release which should be fairly soon. In terms of a work-around until then, one option would be to force the use of GLX since the bug is specific to EGL. Another option would be to enable the aforementioned feature that preserves video memory across suspend / resume, which should have the side-effect of avoiding the problem. That can be done by setting the option "NVreg_PreserveVideoMemoryAllocations=1" for the nvidia kernel module.

Darkspirit's earlier comment mentioned some other settings that appear to prevent the crash, although note that it's basically a use-after-free error and therefore somewhat non-deterministic. So it could be the case that they just work due to luck... I'm not sure. The two things mentioned above should work for certain, though.

Flags: needinfo?(ekurzinger)

I guess I should also say that the specific thing triggering the bug is suspending while there are textures bound to EGLImages. I'm not familiar enough with FF internals to know how it uses such textures, but if that can be avoided somehow it should also avoid the crash.

Thanks Erik. Martin, Andrew, thoughts on what might be the easiest workaround? Would it be reasonable to force GLX for this driver version?

Flags: needinfo?(stransky)
Flags: needinfo?(aosmond)

From what Darkspirit indicates in comment 23, I would prefer to disable DMABUF and/or THREADSAFE_GL for NVIDIA binary driver users. Putting them on GLX implies disabling DMABUF anyways.

Right now we require >= 495.44 for DMABUF. Do we have any sense of a driver range we should consider here?

Edit: Based on comment 26, maybe it is insufficient. I think it is much preferred to switching to GLX. We are trying to get away from GLX whenever possible as it has its own threading issues.

I see it crashing in [510.47.3.0, 525.89.2.0] range. I'd say block DMABUF for 510.0 to 530.0 and see if that is sufficient.

Crash volume increased because bug 1806058 increased WebGL usage.
(Lee Salzman [:lsalzman] from bug 1777849 comment 57)

We set up downloadable Blocklist rules for 110 to prevent Linux + X11 users from enabling accelerated canvas2D.

I will now test these driver versions: https://packages.ubuntu.com/search?suite=jammy-updates&searchon=names&keywords=nvidia-driver-

  • nvidia-driver-525: comment 23 = Only widget.dmabuf-webgl.enabled=false + webgl.threadsafe-gl.force-disabled=true prevented the crash after suspend&resume. If I disabled only one of them, the crash still occured.
  • nvidia-driver-390: black screen after boot.

Tested https://webglsamples.org/aquarium/aquarium.html on Gnome X11, Ubuntu 22.04 LTS:

  • no crash with widget.dmabuf-webgl.enabled=false + webgl.threadsafe-gl.force-disabled=true, neither with multiple windows, but fishes lose their texture on the right and regain it on the left (attached screenshot). Can be fixed with F5.

nvidia-driver-470

MOZ_LOG="Dmabuf:5" firefox/firefox
[Child 13169: Main Thread]: D/Dmabuf We're missing DRM render device!
[Child 13169: Main Thread]: D/Dmabuf nsDMABufDevice::IsDMABufWebGLEnabled: UseDMABuf 0 mUseWebGLDmabufBackend 1 widget_dmabuf_webgl_enabled 1

nvidia-driver-495 is an alias for nvidia-driver-510

MOZ_LOG="Dmabuf:5" firefox/firefox
[Parent 3974: Main Thread]: D/Dmabuf Using DRM device /dev/dri/renderD128
[Parent 3974: Main Thread]: D/Dmabuf nsDMABufDevice::Configure()
[Parent 3974: Main Thread]: D/Dmabuf Loading DMABuf system library libgbm.so.1 ...
[Parent 3974: Main Thread]: D/Dmabuf DMABuf is enabled
[ERROR glean_core] Error setting metrics feature config: Json(Error("EOF while parsing a value", line: 1, column: 0))
[Child 4164: Main Thread]: D/Dmabuf Using DRM device /dev/dri/renderD128
[Child 4164: Main Thread]: D/Dmabuf Failed to open drm render node /dev/dri/renderD128 error Permission denied
[Child 4164: Main Thread]: D/Dmabuf nsDMABufDevice::IsDMABufWebGLEnabled: UseDMABuf 1 mUseWebGLDmabufBackend 1 widget_dmabuf_webgl_enabled 1
[Parent 3974: CanvasRenderer]: D/Dmabuf nsDMABufDevice::IsDMABufWebGLEnabled: UseDMABuf 1 mUseWebGLDmabufBackend 1 widget_dmabuf_webgl_enabled 1
[Parent 3974: CanvasRenderer]: D/Dmabuf nsDMABufDevice::IsDMABufWebGLEnabled: UseDMABuf 1 mUseWebGLDmabufBackend 1 widget_dmabuf_webgl_enabled 1
[Parent 3974: CanvasRenderer]: D/Dmabuf DMABufSurfaceRGBA::Create() from EGLImage UID = 1
[Parent 3974: CanvasRenderer]: D/Dmabuf   imported size 1 x 1 format 34324241 planes 1 modifiers 3000000004fe010
[Parent 3974: CanvasRenderer]: D/Dmabuf DMABufSurfaceRGBA::Serialize() UID 1
[Parent 3974: CanvasRenderer]: D/Dmabuf DMABufSurfaceRGBA::ImportSurfaceDescriptor() UID 1 size 1 x 1
[Parent 3974: CanvasRenderer]: D/Dmabuf   imported size 1 x 1 format 34324241 planes 1
[Parent 3974: CanvasRenderer]: D/Dmabuf DMABufSurfaceRGBA::CreateTexture() UID 1
[Parent 3974: CanvasRenderer]: D/Dmabuf DMABufSurfaceRGBA::ReleaseTextures() UID 1
[Parent 3974: CanvasRenderer]: D/Dmabuf DMABufSurface::ReleaseDMABuf() UID 1
[Parent 3974: CanvasRenderer]: D/Dmabuf DMABufSurfaceRGBA::ReleaseTextures() UID 1
[Parent 3974: CanvasRenderer]: D/Dmabuf DMABufSurfaceRGBA::ReleaseTextures() UID 1
[Parent 3974: CanvasRenderer]: D/Dmabuf DMABufSurface::ReleaseDMABuf() UID 1
[Parent 3974: CanvasRenderer]: D/Dmabuf DMABufSurfaceRGBA::Create() from EGLImage UID = 3
[Parent 3974: CanvasRenderer]: D/Dmabuf   imported size 1024 x 1024 format 34324241 planes 1 modifiers 300000000cdb014
[Parent 3974: CanvasRenderer]: D/Dmabuf DMABufSurfaceRGBA::Serialize() UID 3
[Parent 3974: CanvasRenderer]: D/Dmabuf DMABufSurfaceRGBA::ImportSurfaceDescriptor() UID 3 size 1024 x 1024
[Parent 3974: CanvasRenderer]: D/Dmabuf   imported size 1024 x 1024 format 34324241 planes 1
[Parent 3974: CanvasRenderer]: D/Dmabuf DMABufSurfaceRGBA::Serialize() UID 3
[Child 4164: Main Thread]: D/Dmabuf nsDMABufDevice::IsDMABufWebGLEnabled: UseDMABuf 1 mUseWebGLDmabufBackend 1 widget_dmabuf_webgl_enabled

nvidia-driver-515


nvidia-driver-520 is an alias for nvidia-driver-525 = comment 23

Crash Signature: [@ NvGlEglGetFunctions] → [@ NvGlEglGetFunctions] [@ libnvidia-eglcore.so.470.161.03@0xf710b0 ] [@ libnvidia-eglcore.so.510.108.03@0xe4bd91 ] [@ libnvidia-eglcore.so.515.86.01@0xe5b281 ]
Duplicate of this bug: 1801892
See Also: 1801892
Crash Signature: [@ NvGlEglGetFunctions] [@ libnvidia-eglcore.so.470.161.03@0xf710b0 ] [@ libnvidia-eglcore.so.510.108.03@0xe4bd91 ] [@ libnvidia-eglcore.so.515.86.01@0xe5b281 ] → [@ NvGlEglGetFunctions] [@ libnvidia-eglcore.so.470.161.03@0xf710b0 ] [@ libnvidia-eglcore.so.510.108.03@0xe4bd91 ] [@ libnvidia-eglcore.so.515.86.01@0xe5b281 ] [@ libnvidia-eglcore.so.510.85.02@0xe4bf31] [@ libnvidia-eglcore.so.510.85.02@0xe4bfbd ] …
Duplicate of this bug: 1761644
Summary: Crash in [@ NvGlEglGetFunctions] after suspend&resume → Crash in [@ NvGlEglGetFunctions] after suspend&resume if DMABUF and THREADSAFE_GL are enabled. Fixed in next Nvidia driver 530 release

I just want to say thank you to everyone who is helping resolve this most-heinous bug, most notably Darkspirit for his trouble shooting, and Erik Kurzinger for his work at Nvidia providing the actual fix. Can't wait for my distribution to roll out 530-series NVIDIA drivers with the actual fix baked in. Cheers!

Flags: needinfo?(stransky)

Have anyone tried to enable GPU process? Because i have layers.gpu-process.enabled set to true and this solves problem completely.
Yes, GPu process crashes periodically, but doesnt bring whole browser down.

This is my example of same crash - https://crash-stats.mozilla.org/report/index/66b6c18c-11c4-4896-8fa4-6d0220230304

It appears our blocklisting efforts have successfully brought the crash rate down.

(In reply to V. Korn from comment #36)

Have anyone tried to enable GPU process? Because i have layers.gpu-process.enabled set to true and this solves problem completely.
Yes, GPu process crashes periodically, but doesnt bring whole browser down.

This is my example of same crash - https://crash-stats.mozilla.org/report/index/66b6c18c-11c4-4896-8fa4-6d0220230304

Unfortunately the GPU process as currently implemented on Linux won't work with Wayland, so we haven't invested effort in shipping it. My understanding is to make it work with Wayland, we would need to do something similar to what we do on Android, where we proxy to the parent process at the final stages of the compositing pipeline.

Flags: needinfo?(aosmond)

(In reply to Erik Kurzinger from comment #26)

The driver bug can be fixed with a fairly low-risk change, so I might be able to get it into the next 530 release which should be fairly soon. In terms of a work-around until then, one option would be to force the use of GLX since the bug is specific to EGL. Another option would be to enable the aforementioned feature that preserves video memory across suspend / resume, which should have the side-effect of avoiding the problem. That can be done by setting the option "NVreg_PreserveVideoMemoryAllocations=1" for the nvidia kernel module.

Darkspirit's earlier comment mentioned some other settings that appear to prevent the crash, although note that it's basically a use-after-free error and therefore somewhat non-deterministic. So it could be the case that they just work due to luck... I'm not sure. The two things mentioned above should work for certain, though.

Hey Erik, Can you confirm if the fix for this bug made it into the 530 release?

https://www.nvidia.com/download/driverResults.aspx/200481/en-us/
Version: 530.41.03

Thanks

Flags: needinfo?(ekurzinger)

No, the fix turned out to be more complicated than I initially thought and I was unable to get it checked in in time for the release. Apologies.

Flags: needinfo?(ekurzinger)

Thanks Erik.

Whoever can/is responsible for bug https://bugzilla.mozilla.org/show_bug.cgi?id=1820055 may need to be adjusted as it is a workaround hard coded for NVIDIA driver releases 530 and lower.

Duplicate of this bug: 1818178

Donald,

You're going to have re-open Bug #1820055 to alter the workaround beyond release 530, as per Erik's comment about the fix not making it into 530 so without chaning the workaround the crash will return.

Flags: needinfo?(dmeehan)

Redirecting needinfo to :aosmond for a follow-up on comment 39 - comment 42

Flags: needinfo?(dmeehan) → needinfo?(aosmond)

Landing a fix in bug 1824778. When we have a confirmed working driver, we can unblock.

Flags: needinfo?(aosmond)

Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.

For more information, please visit auto_nag documentation.

@aosmond so what's the state with this bug? Can you annotate it with fixed/etc if it is so?

Flags: needinfo?(aosmond)

Kelsey, this big is on hold awaiting a driver fix from Nvidia, which makes Erik Kurzinger from Nvidia the primary point of contact.

After the Nvidia fix comes in then dmabuf can be re-enabled.

Any progress, Erik?

Flags: needinfo?(ekurzinger)

Yep, once we hear from NVIDIA on a specific driver version with the fix, we can make a more fine tuned blocklist rule.

Flags: needinfo?(aosmond)

Sorry for the slow response. The fix will be in the 545 driver release, that'll be the next major version after 535 which went public recently.

Flags: needinfo?(ekurzinger)

Dropping a note that 535 remains the current distributed driver release at this time. This report will remain pending on the new 545 release, but I'm setting it to stalled to indicate no further action can be taken on it until this external event occurs. (If there's a more appropriate keyword for that kind of scenario, please correct me)

Keywords: stalled

(In reply to Erik Kurzinger from comment #49)

Sorry for the slow response. The fix will be in the 545 driver release, that'll be the next major version after 535 which went public recently.

It's released on windows, and still in beta for Linux, but 545 has released. Can you confirm if this fix made it into the 545-series of drivers, Erik?

Flags: needinfo?(ekurzinger)

Yes, this should be fixed in the 545 beta release.

Flags: needinfo?(ekurzinger)

Awesome, I guess the next step is to re-enable the code that implements DMABUF and THREADSAFE_GL with a minimum version of NVIDIA 545 driver.

Flags: needinfo?(aosmond)
See Also: → 1860048

Okay, let's enable this on nightly and early beta and see what shakes out. When the driver itself hits release, we can look at shipping to release as well.

Flags: needinfo?(aosmond)

Pop!_OS just updated to the NVIDIA 545 driver.
Pop!_OS is an X11 windowing system distribution.

Just tested with nightly 20231109165012 via https://packages.mozilla.org/apt

No crash when resuming from suspend anymore. Used to be 100% crash prior to driver numbering blacklist.

Tested several times and it seems good, but I assume you'll see an uptick on telemetry that I won't be privy to if others still get the crash.

Summary: Crash in [@ NvGlEglGetFunctions] after suspend&resume if DMABUF and THREADSAFE_GL are enabled. Fixed in next Nvidia driver 530 release → Crash in [@ NvGlEglGetFunctions] after suspend&resume if DMABUF and THREADSAFE_GL are enabled. Fixed in Nvidia driver 545

I have Nvidia driver 545 and on top of that, never suspend my computer so the issue was never relevant to me, but still can't enable DMABUF. The preference is called "force-enabled", but Firefox 121.0b9 ignores me and disables the functionality. I am the owner of my computer, so why does setting "force-enabled" not force enabling the functionality?

While we await the months it takes to release of Nvidia driver 545 beyond NFB and beta, I suggest changing settings which are currently named "force-enabled" to "suggest-enabled" and adding new "force-enabled" settings which serve to actually force enablement. Alternatively, leave the "force-enabled" settings and add a "widget.dmabuf.override-knownissue-blocklist" setting. The user should get the final say in what happens on their computer.

(In reply to lexlexlex from comment #56)

I have Nvidia driver 545 and on top of that, never suspend my computer so the issue was never relevant to me, but still can't enable DMABUF. The preference is called "force-enabled", but Firefox 121.0b9 ignores me and disables the functionality. I am the owner of my computer, so why does setting "force-enabled" not force enabling the functionality?

While we await the months it takes to release of Nvidia driver 545 beyond NFB and beta, I suggest changing settings which are currently named "force-enabled" to "suggest-enabled" and adding new "force-enabled" settings which serve to actually force enablement. Alternatively, leave the "force-enabled" settings and add a "widget.dmabuf.override-knownissue-blocklist" setting. The user should get the final say in what happens on their computer.

Hi Lexlexlex,

I reviewed the code because of the issue you are experiecing, but it looks like we implemented the widget.dmabuf.force-enabled as one would expect [1]. There are two cases [2] and [3] where we might fail at runtime to enable DMABuf despite the pref. I wonder if you are hitting one of those two cases. If you could attach your about:support, then I would be happy to take a look and see if I figure out why it isn't working for you. Rest assured our intent is to allow the user to turn it on if they want.

[1] https://searchfox.org/mozilla-central/rev/b60fe683f005785706074b8cd8a6dcbc363936e0/gfx/thebes/gfxPlatformGtk.cpp#217
[2] https://searchfox.org/mozilla-central/rev/b60fe683f005785706074b8cd8a6dcbc363936e0/gfx/thebes/gfxPlatformGtk.cpp#224
[3] https://searchfox.org/mozilla-central/rev/b60fe683f005785706074b8cd8a6dcbc363936e0/gfx/thebes/gfxPlatformGtk.cpp#239

Attached file about:support
My about:support result explicitly states in the decision log that DMABUF is "blocklisted", with a link to this issue ticket as justification:

>default | available
>user | force_enabled | Force enabled by pref
>env | blocklisted | Blocklisted by gfxInfo | Blocklisted due to known issues: [bug 1788573](https://bugzilla.mozilla.org/show_bug.cgi?id=1788573)

When attaching the file above, it changed my comment into a monospace font and broke the markdown, making me look like I don't know how to write markdown, so here's my comment formatted as intended.

My about:support result explicitly states in the decision log that DMABUF is "blocklisted", with a link to this issue ticket as justification:

default | available
user | force_enabled | Force enabled by pref
env | blocklisted | Blocklisted by gfxInfo | Blocklisted due to known issues: bug 1788573

Maybe the UI could use some improvements given it is unclear, but the status field, third from the top, is force_enabled which means it should be used. You can enable logging and watch a H264 or VP9 video to prove it (MOZ_LOG="Dmabuf:5,DmabufRef:5" firefox or similar via about:logging).

It is true there is an entry about the blocklist, but the UI is misleading. We added the full decision log so that we could understand fully why some users get something enabled or disabled. The order of precedence is runtime/*, user/force_enabled, env/*, user/*, default/*. Since you have a user/force_enabled entry, it takes precedence over the env/blocklisted entry.

        {
          "name": "DMABUF",
          "description": "DMABUF",
          "status": "force_enabled",
          "log": [
            {
              "type": "default",
              "status": "available"
            },
            {
              "type": "user",
              "status": "force_enabled",
              "message": "Force enabled by pref"
            },
            {
              "type": "env",
              "status": "blocklisted",
              "failureId": "FEATURE_FAILURE_BUG_1788573",
              "message": "Blocklisted by gfxInfo"
            }
          ]
        },

FWIW, I filed bug 1870535 for about:support being confusing.

Okay, thanks for that analysis and bug 1870535. The UI should definitely be made clearer, ideally by simply displaying the final decision in the decision log. It could have a field that says "Decision: Enabled" or something like that.

That being said, the reason I have thought it was not actually enabled is that I have been trying to figure out why HEVC (H265) hardware acceleration is disabled. There's no reason I can find. I have an Nvidia GTX 1080 Ti and all the packages I need, but it's still disabled and there's no decision log explaining it. The decision log is very nice, but I would appreciate it even further if it logged more decisions, like why exactly HEVC hardware acceleration is disabled.

Thanks for the info, and I'll stop posting since I know this is not a support forum. I wanted to give some insight into how this bug report is being used, but it seems I didn't realize what was actually happening.

Your codec support from about:support indicates we think you should be using HW decoding:

"codecSupportInfo": "H264 SW HW\nVP9 SW HW\nVP8 SW\nAV1 SW\nHEVC NONE\nTheora SW\nAAC SW\nMP3 SW\nOpus SW\nVorbis SW\nFLAC SW\nWave SW"

I'm happy to continue investigating if you are not getting hardware decoding. Next steps would be:

  1. Create a fresh profile with the necessary prefs flipped, visit profiler.firefox.com to install the profiler addon, and restart.
  2. Go to about:logging and select the "Media" preset
  3. Add ,Dmabuf:5,DmabufRef:5 to the end of the logging modules text box and hit Set Log Modules.
  4. Click the arrow next to the new profiler icon, and select the "Media" preset for profiling.
  5. Click on the profiler icon to start profiling.
  6. Visit a website which should have H264 HW decoding and play the video for 15 seconds, then stop.
  7. Click on the profiler icon to stop profiling.
  8. When it opens the new tab, click on "Upload Local Profile" and check all the boxes.
  9. File a new bug explaining your problem and give us the link it generated for the profile.

Oh wait, you said H265, not H264. Yeah it didn't detect support for that. Not sure why.

(Offtopic)
(In reply to lexlexlex from comment #62)

why HEVC (H265) hardware acceleration is disabled. There's no reason I can find.

IIUC:
GPU vendors don't buy codec patent licenses, the one who puts in the last missing software piece must do.
H265/HEVC isn't compiled and not utilized by Linux Firefox (bug 1857097). Mozilla would have to pay license fees to multiple H265 patent pools. (In theory, Raspberry Pi could integrate it in their Firefox build for their licensed devices.) Windows users have to buy HEVC Video Extensions to make it work.
bug 1601815 is the alternative to HEVC.
(H264/AVC: Many patents expired, and if the user didn't manually install ffmpeg, Linux Firefox downloads the OpenH264 decoder from Cisco who pay a capped fee for users' h264 licenses.)

545 has been shipping for a couple of months by now. Is it time to drop the EARLY_BETA_OR_EARLIER guard?

Flags: needinfo?(aosmond)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: