Crash exiting fullscreen (F11) with gfx.canvas.accelerated set on Nvidia + Wayland
Categories
(Core :: Graphics, defect)
Tracking
()
People
(Reporter: jplebreton, Unassigned)
References
(Blocks 2 open bugs)
Details
(Keywords: crash)
Crash Data
Attachments
(2 files)
42.18 KB,
text/plain
|
Details | |
1.84 KB,
patch
|
Details | Diff | Splinter Review |
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/116.0
Steps to reproduce:
- Run Firefox on a GNOME Wayland session with an Nvidia GPU.
- Ensure gfx.canvas.accelerated is set to true.
- Open a page that uses Canvas, eg https://webglsamples.org/aquarium/aquarium.html
- Press F11 to make the browser full screen, then press it again to exit full screen. Observe: Firefox doesn't crash.
- Wait a while; leave the Canvas tab open and use the browser for other stuff. (Sorry this is underspecified; I don't know what exactly causes it to start happening.)
- With any tab (not just the Canvas tab) open, press F11 once to fullscreen, then press it again.
Actual results:
The browser crashes and the standard Firefox crash report dialog comes up.
Expected results:
The browser shouldn't crash when exiting fullscreen.
A few findings from this issue I filed earlier to try to narrow down the causes / related phenonema: https://github.com/Robbendebiene/Gesturefy/issues/679
- This also happens with extensions that use a Canvas, eg Gesturefy's use of it for drawing the user's gesture trail on screen above the page in Firefox. Turning off the trail made the crash stop happening.
- This only happens in Wayland; when I start Firefox in an X11 session, the crash doesn't happen.
- This doesn't happen when gfx.canvas.accelerated is set to false.
Hence the specific combination of terms in the bug summary.
Given that Nvidia's Wayland support is still maturing, I wouldn't be surprised if this is a bug on their end, but it seemed logical to file this bug first to determine that it wasn't a Firefox issue - as a game developer I have plenty of other applications on my system that create fullscreen (all the various kinds - borderless, exclusive, etc) windows, but only Firefox exhibits this crash.
Happy to provide additional information as needed.
Reporter | ||
Comment 1•2 years ago
|
||
One last point I forgot to note above: I'm running the Nvidia binary drivers, version 535.98.
Comment 2•2 years ago
|
||
The Bugbug bot thinks this bug should belong to the 'Core::Graphics' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.
Comment 3•2 years ago
|
||
Thank you for the bug report. Could you please go to about:support, click "copy text to clipboard" and attach the contents to this bug?
Could you please also post a link to a crash report from about:crashes?
cc Lee for accelerated canvas crash
Comment 4•2 years ago
|
||
Can you please submit a crash report from about:crashes? Without having a crash stack, it's pretty difficult to diagnose what's going on here.
Reporter | ||
Comment 5•2 years ago
|
||
as requested, contents of about:support from a minimal profile that reproduces the crash
Reporter | ||
Comment 6•2 years ago
|
||
And here's a crash report link:
https://crash-stats.mozilla.org/report/index/fc09b7fc-8296-481b-85f4-6c0450230829
Comment 7•2 years ago
|
||
(In reply to jplebreton from comment #6)
And here's a crash report link:
https://crash-stats.mozilla.org/report/index/fc09b7fc-8296-481b-85f4-6c0450230829
Martin, it looks like something is getting double-freed inside a wayland resize callback?
Comment 8•2 years ago
|
||
To me this clearly looks like a bug in the Nvidia driver, a race condition.
If I read the stack right, the driver crashes because it handles updated linux dmabuf feedback tranches in the resize callback. The new tranch (without scanout tranch) is expected to get send when exiting fullscreen on Gnome, so that's not a surprise.
The fact that it's only observed so far with gfx.canvas.accelerated
enabled is likely due to timing differences, as reallocating/resizing buffers on the GPU takes time.
Comment 9•2 years ago
•
|
||
Just a guess, but the driver calls function create_surface_context
, indicating that it destroyed the old surface context - which may contained a reference to the format list to be destroyed in wlEglDestroyFormatSet
, given that tranche data is surface related.
Comment 10•1 years ago
|
||
The bug has a crash signature, thus the bug will be considered confirmed.
Comment 13•1 years ago
|
||
User in bug 1850285 provided crash reports matching this signature. STR are watching video in fullscreen then exiting fullscreen
Comment 14•1 years ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 10 desktop browser crashes on nightly
:stransky, could you consider increasing the severity of this top-crash bug?
For more information, please visit BugBot documentation.
Comment 15•1 years ago
|
||
Bug 1844543 landed so would be great if crash ratio is lower now.
Updated•1 years ago
|
Comment 16•1 years ago
|
||
Bug 1850696 may be another cause of it - if we're running out of file descriptors on WebGL it can happen anything.
Comment 17•1 years ago
|
||
As mentioned in Comment 8 I'm pretty sure this is simply a Nvidia driver bug - specifically in their EGL Wayland implementation. Nothing we can do anything about apart from pinging Nvidia devs (or disabling HW Webrender).
Arthur, could you maybe have a look here?
Reporter | ||
Comment 18•1 years ago
|
||
I created a thread on the Nvidia Linux driver support forums that refers back to this bug report: https://forums.developer.nvidia.com/t/crash-exiting-fullscreen-in-firefox-under-wayland/266666
Hopefully this makes the driver team there aware of this and/or someone there will be able to confirm or refute the suggestion here that it's a driver issue.
Comment 19•1 years ago
|
||
Robert, I'm not qualified to help, but Erik who I believe you've interacted with before, should be.
Comment 20•1 years ago
|
||
Earlier I investigated what is almost certainly the same crash in https://bugzilla.mozilla.org/show_bug.cgi?id=1840360. Unfortunately I was unable to reproduce it and couldn't think of a plausible explanation by inspection.
I still haven't managed to get a local repro. Strangely, on my system it looks like mutter doesn't want to enable direct scan-out for Firefox even when it's full-screen, and hence it never sends any dmabuf feedback events. Other full-screen EGL applications do get scanned out, so I'm not sure what's going on there.
However, while I was poking around I noticed that there actually is a pretty bad bug in egl-wayland's dmabuf feedback handling, and I think it could potentially explain what folks have been seeing. The problem is that, after a window is resized (by wl_egl_window_resize), we immediately try to dispatch the queue that the feedback events are received on, but the compositor won't send those events until after the next wl_surface_commit when we attach a buffer with the new size (meaning after the next eglSwapBuffers call).
So when you make a window full-screen, egl-wayland doesn't actually receive a new feedback. When the compositor does send one later, it will just sit in the wl_event_queue. That is until you un-full-screen the window. Then it gets processed. Thus, we're effectively always one-feedback-behind where we should be.
Therefore, my theory as to what ends up causing the double-free is a surface getting destroyed while there's still that one pending batch of feedback events in the queue from the most recent full-screening / un-full-screening. Then when some other surface eventually dispatches those, it causes a crash.
One possible fix would be to use the per-surface queue for dmabuf feedback events instead of the per-display queue which we currently do. The per-surface queue gets dispatched every frame, so that should ensure they always get processed promptly.
Alas, I can't really confirm the above. If anyone who is able to reproduce the crash would be willing to build egl-wayland (https://github.com/NVIDIA/egl-wayland) with the attached patch applied I would be grateful.
Comment 21•1 year ago
|
||
I'd like to also update this bug that I tested Erik's patch on top of master branch of egl-wayland on my system and in at least 1 day for many times trying to reproduce it, I never get crash. So I could say the patch fixed this.
Reporter | ||
Comment 22•1 year ago
|
||
I wanted to try out Erik's package on my system. I'm on an Arch-based distro so the first step was to replace the official repo's egl-wayland package (last updated 2023-06-06) with the AUR (locally built) egl-wayland-git package. Without changing anything about the code from the latest git main branch (ie not having yet applied Erik's patch), this change appears to have fixed the crash on my system. So it's possible that the two commits to egl-wayland's git since its 1.1.12 release was tagged fixed the issue? As it's locally built there are many other variables that could also explain the change, but I'm offering the info here in case it's useful.
Comment 23•1 year ago
|
||
Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.
For more information, please visit BugBot documentation.
Updated•1 year ago
|
Updated•1 year ago
|
Comment 24•1 year ago
|
||
May no longer be a topcrasher, but I'm hitting wlEglDestroyFormatSet crash signature daily since switching to Wayland
No fullscreen mode is involved in my case, however.
Comment 25•1 year ago
|
||
Yeah it crashes often here too, even in full screen google meet.
Comment 26•1 year ago
|
||
tekstr1der / Alexandre: Could you try the patch mentioned in Comment 20 / 21 / 22? (given that the commits landed already you can simply clone https://github.com/NVIDIA/egl-wayland/commits/master)
There doesn't appear to be a release with them yet unfortunately.
Comment 27•1 year ago
|
||
(In reply to Robert Mader [:rmader] from comment #26)
tekstr1der / Alexandre: Could you try the patch mentioned in Comment 20 / 21 / 22? (given that the commits landed already you can simply clone https://github.com/NVIDIA/egl-wayland/commits/master)
Sure. I built egl-wayland 1.1.12+r3+g3f9889c, and will see how things go. Thanks!
Reporter | ||
Comment 28•1 year ago
|
||
Also might be worth trying to repro with the just-released 545 beta drivers: https://www.nvidia.com/download/driverResults.aspx/212964/en-us/
Comment 29•1 year ago
|
||
I posted a new release of egl-wayland today, 1.1.13, which should contain the fix.
Comment 30•1 year ago
|
||
Thanks Erik!
Comment 31•1 year ago
|
||
Marking as fixed by https://github.com/NVIDIA/egl-wayland/releases/tag/1.1.13
Comment 33•1 year ago
|
||
Copying crash signatures from duplicate bugs.
Description
•