Wayland WebRTC main process crash in [@ wl_list_insert] due to broken Nvidia driver setup
Categories
(Core :: WebRTC, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr102 | --- | unaffected |
firefox110 | --- | unaffected |
firefox111 | --- | wontfix |
firefox112 | --- | fixed |
People
(Reporter: csasca, Assigned: jgrulich)
References
(Blocks 1 open bug, Regression)
Details
(Keywords: crash, regression)
Crash Data
Attachments
(7 files)
Crash report: https://crash-stats.mozilla.org/report/index/50c59e5a-c757-4f41-bc3f-5c6290230227
Reason: SIGSEGV / SEGV_MAPERR
Top 10 frames of crashing thread:
0 libwayland-client.so.0 wl_list_insert src/wayland-util.c:47
1 libwayland-client.so.0 wl_display_read_events src/wayland-client.c:1705
2 libwayland-client.so.0 wl_display_dispatch_queue src/wayland-client.c:1912
3 libwayland-client.so.0 wl_display_roundtrip_queue src/wayland-client.c:1358
4 libnvidia-egl-wayland.so.1 libnvidia-egl-wayland.so.1@0x80dc
5 libEGL_nvidia.so.0 NvEglwlaf47906in
6 libEGL_nvidia.so.0 NvEglwlaf47906in
7 libxul.so webrtc::EglDmaBuf::EglDmaBuf third_party/libwebrtc/modules/desktop_capture/linux/wayland/egl_dmabuf.cc:375
8 libxul.so std::make_unique<webrtc::EglDmaBuf, > /builds/worker/fetches/sysroot-x86_64-linux-gnu/usr/include/c++/7/bits/unique_ptr.h:821
8 libxul.so webrtc::SharedScreenCastStreamPrivate::StartScreenCastStream third_party/libwebrtc/modules/desktop_capture/linux/wayland/shared_screencast_stream.cc:395
Platforms
- Affected: Ubuntu 22.04
- Unaffected: Windows and macOS
Steps to Reproduce
- Launch Firefox
- Access this sample page and select screen capture
- Select allow on the popup and then select any window or just fullscreen
- Press stop and go to screen capture again and select any window or fullscreen again
Notes
- The crash can be seen in the added attachment
- Happens on Firefox 112.0a1 and 111.0b6
- Feel free to change the component to the right one if this isn't the correct one.
- The OS is Ubuntu 22.04 with Wayland session active
Updated•2 years ago
|
Comment 1•2 years ago
|
||
The bug is marked as tracked for firefox111 (beta) and tracked for firefox112 (nightly). We have limited time to fix this, the soft freeze is in 9 days. However, the bug still isn't assigned.
:jimm, could you please find an assignee for this tracked bug? If you disagree with the tracking decision, please talk with the release managers.
For more information, please visit auto_nag documentation.
Updated•2 years ago
|
Reporter | ||
Comment 2•2 years ago
|
||
Found regression range between 2023-01-19 and 2023-01-20. Aditionally on the unaffected builds, the sharescreen works as expected and not a black screen like in the attachment.
Last good revision: 5bddea08b7b1db3d1679ab609c591cb8870b9a07
First bad revision: a9ca4976b1fd5521247ec2c9c69ca14f02324881
Comment 3•2 years ago
|
||
Set release status flags based on info from the regressing bug 1790496
Updated•2 years ago
|
Comment 4•2 years ago
|
||
Crash on Wayland/NVIDIA proprietary drivers. WebRTC/Pipewire doesn't check NVIDIA driver version so it creates it even over unsupported devices/configs.
Catalin, can you attach your about:support page please?
Thanks.
Comment 5•2 years ago
|
||
Jan,
I'm surprised that EglDmaBuf::EglDmaBuf() creates GL/dmabuf blindly without any driver check:
is the same code used in Chrome/Chromium?
Assignee | ||
Comment 6•2 years ago
|
||
(In reply to Martin Stránský [:stransky] (ni? me) from comment #5)
Jan,
I'm surprised that EglDmaBuf::EglDmaBuf() creates GL/dmabuf blindly without any driver check:
It doesn't use dmabufs if it doesn't find required extension. Here [1] it returns an empty list of supported modifiers and without any modifier announced (btw. DRM_FORMAT_MOD_INVALID is for modifier-less buffers) it should fallback to memfd PipeWire buffers.
Also, I would expect the compositor doesn't advertise dmabuf support in such case so even in such case it should fallback to memfd buffers.
Or do you have any other check in mind?
is the same code used in Chrome/Chromium?
Yes, that code hasn't been touched in upstream for a while and I haven't seen this crash at all with Chromium in the past.
Reporter | ||
Comment 7•2 years ago
|
||
Sure thing, here's the about:support from 111.0b8. Please let me know if more infos are needed!
Assignee | ||
Comment 8•2 years ago
|
||
Ok, I can now see what's going on and misunderstood the question. I'm currently investigating it.
Is there a way to enable debug information from WebRTC so all RTC_LOG(LS_ERROR) will be seen?
Assignee | ||
Comment 9•2 years ago
|
||
There is definitely some mismatch here as according to your output from about:support Firefox use your Intel card, while in the backtrace this leads to NVidia stuff, for that reason it would be really helpful to get the logs from WebRTC if possible. Alternatively you can replace these with something else to have it printed. We need all logs from third_party/libwebrtc/modules/desktop_capture/linux/wayland/egl_dmabuf.cc.
Also, can you also include output from "eglinfo"? It should be in egl-utils package or something like that? At least it's there in Fedora.
Updated•2 years ago
|
Reporter | ||
Comment 10•2 years ago
|
||
Yes, may be related as this is a laptop with an Intel igpu and a dedicated Nvidia GTX 960M (but this is set to only be used on demand on nvidia control panel).
Here's attached the eglinfo, may be of some help.
As for the egl_dmbauf.cc, I couldn't find it anywhere in the system unfortunately (using the search in files option).
Comment 11•2 years ago
|
||
Actually the egl_dmbauf code works as expected but it doesn't consider XWayland scenario here or multi GPU setup.
Jan I expect you want to upstream that code, right? In such case we should use general gtk calls to check used display type.
You should use something like GdkIsWaylandDisplay():
https://searchfox.org/mozilla-central/rev/00ea1649b59d5f427979e2d6ba42be96f62d6e82/widget/gtk/WidgetUtilsGtk.cpp#103
To check if Firefox is running on Wayland. Otherwise you should use X11/EGL.
Also EGL_PLATFORM_WAYLAND_KHR/EGL_PLATFORM_GBM_KHR is not optimal solution here but I don't know what code Google uses for it.
EglGetPlatformDisplay(EGL_PLATFORM_WAYLAND_KHR,...) uses default Wayland display and may broke in multi-display scenario (when testing on nested compositor for instance). Does EGL_PLATFORM_WAYLAND_KHR use actual Wayland display?
Anyway, Firefox creates GL context on Wayland differently:
https://searchfox.org/mozilla-central/rev/00ea1649b59d5f427979e2d6ba42be96f62d6e82/gfx/gl/GLContextProviderEGL.cpp#1151
i.e. Create EGLSurface over wl_surface:
https://searchfox.org/mozilla-central/rev/00ea1649b59d5f427979e2d6ba42be96f62d6e82/gfx/gl/GLContextProviderEGL.cpp#836
and then create GL context for it. You're sure you're running on correct GFX card/wayland display then.
EglGetPlatformDisplayEXT(EGL_PLATFORM_GBM_KHR,...) for EGL/X11 may work but you need correct drm device - you should get it from used GL context and not just pick it randomly. You may end up with wrong gfx card on mixed Intel/NVIDIA systems at least.
Assignee | ||
Comment 12•2 years ago
|
||
(In reply to Catalin Sasca, QA [:csasca] from comment #10)
As for the egl_dmbauf.cc, I couldn't find it anywhere in the system unfortunately (using the search in files option).
That's a source file in Firefox (libwebrtc). I was hoping you are able to build FF yourself with some modifications.
But I can see the logging from WebRTC can be enabled, see: https://wiki.mozilla.org/Media/WebRTC/Logging. It should be the "getUserMedia" stuff I believe so perhaps "MOZ_LOG=MediaManager:4,GetUserMedia:4" will make it work? I will try locally myself and let you know what worked for me.
Comment 13•2 years ago
|
||
There's how Firefox matches GL/Wayland display:
https://searchfox.org/mozilla-central/rev/00ea1649b59d5f427979e2d6ba42be96f62d6e82/gfx/gl/GLLibraryEGL.cpp#903
Comment 14•2 years ago
|
||
(In reply to grulja from comment #12)
(In reply to Catalin Sasca, QA [:csasca] from comment #10)
As for the egl_dmbauf.cc, I couldn't find it anywhere in the system unfortunately (using the search in files option).
That's a source file in Firefox (libwebrtc). I was hoping you are able to build FF yourself with some modifications.
I'm not sure what do you mean here. Is egl_dmbauf.cc used by any other project or is that Firefox specific code and Chrome/Chromimu runs something different?
Assignee | ||
Comment 15•2 years ago
|
||
(In reply to Martin Stránský [:stransky] (ni? me) from comment #11)
Actually the egl_dmbauf code works as expected but it doesn't consider XWayland scenario here or multi GPU setup.
Jan I expect you want to upstream that code, right? In such case we should use general gtk calls to check used display type.You should use something like GdkIsWaylandDisplay():
https://searchfox.org/mozilla-central/rev/00ea1649b59d5f427979e2d6ba42be96f62d6e82/widget/gtk/WidgetUtilsGtk.cpp#103To check if Firefox is running on Wayland. Otherwise you should use X11/EGL.
Also EGL_PLATFORM_WAYLAND_KHR/EGL_PLATFORM_GBM_KHR is not optimal solution here but I don't know what code Google uses for it.
This code is used only on Wayland, there is check in WebRTC for that:
https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/desktop_capture/desktop_capturer.cc;l=105
EglGetPlatformDisplay(EGL_PLATFORM_WAYLAND_KHR,...) uses default Wayland display and may broke in multi-display scenario (when testing on nested compositor for instance). Does EGL_PLATFORM_WAYLAND_KHR use actual Wayland display?
From the documentation:
To obtain an EGLDisplay backed by a Wayland display, call eglGetPlatformDisplay with <platform> set to EGL_PLATFORM_WAYLAND_KHR.
So I assume it does what we need? Also the issue here is not multi-display scenario.
Anyway, Firefox creates GL context on Wayland differently:
https://searchfox.org/mozilla-central/rev/00ea1649b59d5f427979e2d6ba42be96f62d6e82/gfx/gl/GLContextProviderEGL.cpp#1151
i.e. Create EGLSurface over wl_surface:
https://searchfox.org/mozilla-central/rev/00ea1649b59d5f427979e2d6ba42be96f62d6e82/gfx/gl/GLContextProviderEGL.cpp#836
and then create GL context for it. You're sure you're running on correct GFX card/wayland display then.
I'll check, thanks.
EglGetPlatformDisplayEXT(EGL_PLATFORM_GBM_KHR,...) for EGL/X11 may work but you need correct drm device - you should get it from used GL context and not just pick it randomly. You may end up with wrong gfx card on mixed Intel/NVIDIA systems at least.
We have this just as a fallback in case eglGetPlatformDisplay() fails to give us what we want. I will explore what you posted above.
Assignee | ||
Comment 16•2 years ago
|
||
(In reply to Martin Stránský [:stransky] (ni? me) from comment #14)
(In reply to grulja from comment #12)
(In reply to Catalin Sasca, QA [:csasca] from comment #10)
As for the egl_dmbauf.cc, I couldn't find it anywhere in the system unfortunately (using the search in files option).
That's a source file in Firefox (libwebrtc). I was hoping you are able to build FF yourself with some modifications.
I'm not sure what do you mean here. Is egl_dmbauf.cc used by any other project or is that Firefox specific code and Chrome/Chromimu runs something different?
I meant in case WebRTC logging from that file cannot be enabled to just use something like s'RTC_LOG(LS_ERROR)/std::cout/' or whatever that will provide any output.
Assignee | ||
Comment 17•2 years ago
|
||
(In reply to grulja from comment #15)
(In reply to Martin Stránský [:stransky] (ni? me) from comment #11)
Anyway, Firefox creates GL context on Wayland differently:
https://searchfox.org/mozilla-central/rev/00ea1649b59d5f427979e2d6ba42be96f62d6e82/gfx/gl/GLContextProviderEGL.cpp#1151
i.e. Create EGLSurface over wl_surface:
https://searchfox.org/mozilla-central/rev/00ea1649b59d5f427979e2d6ba42be96f62d6e82/gfx/gl/GLContextProviderEGL.cpp#836
and then create GL context for it. You're sure you're running on correct GFX card/wayland display then.I'll check, thanks.
I'm afraid we won't be able to use this, because WebRTC doesn't have Wayland library available, it's only in Chromium.
Comment 18•2 years ago
•
|
||
(In reply to grulja from comment #15)
(In reply to Martin Stránský [:stransky] (ni? me) from comment #11)
Actually the egl_dmbauf code works as expected but it doesn't consider XWayland scenario here or multi GPU setup.
Jan I expect you want to upstream that code, right? In such case we should use general gtk calls to check used display type.You should use something like GdkIsWaylandDisplay():
https://searchfox.org/mozilla-central/rev/00ea1649b59d5f427979e2d6ba42be96f62d6e82/widget/gtk/WidgetUtilsGtk.cpp#103To check if Firefox is running on Wayland. Otherwise you should use X11/EGL.
Also EGL_PLATFORM_WAYLAND_KHR/EGL_PLATFORM_GBM_KHR is not optimal solution here but I don't know what code Google uses for it.This code is used only on Wayland, there is check in WebRTC for that:
https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/desktop_capture/desktop_capturer.cc;l=105
Well, with Chrome/Chromimum moving to Wayland it would be good to fix that in general way and not hack it for Firefox only. Chrome/Chromimum will hit the same issues sooner or later (at least EGL/X11 ones and Wayland/XWayland scenarios on Intel/NVIDIA laptops).
EglGetPlatformDisplay(EGL_PLATFORM_WAYLAND_KHR,...) uses default Wayland display and may broke in multi-display scenario (when testing on nested compositor for instance). Does EGL_PLATFORM_WAYLAND_KHR use actual Wayland display?
From the documentation:
To obtain an EGLDisplay backed by a Wayland display, call eglGetPlatformDisplay with <platform> set to EGL_PLATFORM_WAYLAND_KHR.
See Bug 1773377 for instance. If you want GL to use correct display/setup as Firefox itself you need to create it directly over something which you're already using (recent X11/Wayland display connection or so). Otherwise GL tends to pick whatever it dares so you may end up with using NVIDIA on Wayland while Firefox is running X11 on Intel as this bug says :)
Comment 19•2 years ago
•
|
||
(In reply to grulja from comment #17)
I'm afraid we won't be able to use this, because WebRTC doesn't have Wayland library available, it's only in Chromium.
You can use dlsym then, there's only few functions needs:
gdk_wayland_display_get_type()
gdk_x11_display_get_type()
wl_compositor_create_surface()
wl_surface_destroy()
wl_egl_window_create()
wl_egl_window_destroy()
Comment 20•2 years ago
|
||
This is a reminder regarding comment #1!
The bug is marked as tracked for firefox111 (beta) and tracked for firefox112 (nightly). We have limited time to fix this, the soft freeze is in 3 days. However, the bug still isn't assigned.
Comment 21•2 years ago
|
||
Since the crash volume is low (less than 15 per week), the severity is downgraded to S3
. Feel free to change it back if you think the bug is still critical.
For more information, please visit auto_nag documentation.
Comment 22•2 years ago
|
||
I think it crashes if desktop portal selects dmabuf sharing.
Until it's fixed we can disable it by removing SPA_DATA_DmaBuf from supported buffer types:
Updated•2 years ago
|
Comment 23•2 years ago
|
||
AFAIK SPA_DATA_DmaBuf data sharing doesn't add any extra value to us as we need to convert it and download to user space anyway. We don't support to create Texture/Data client over imported dmabuf and used it as texture for rendering right now (if that's even supported at WebRTc side), but it only adds extra complications with dmabuf user space import.
Jan, please correct me if I'm wrong.
Assignee | ||
Comment 24•2 years ago
|
||
I have been in touch with Jonas Adahl (Mutter maintainer) and we have been talking about this since Friday and investigating what's going on, trying to reproduce etc.. What we actually think is that this is not an issue in Firefox or WebRTC, but rather a driver or configuration issue as Firefox seem to be using a different graphics card for X11, while picking the other one for Wayland (as can be seen with eglinfo).
Using DmaBuf makes a big difference on the performance. On the compositor side the whole desktop is less sluggish without needing to download everything every frame, which can for example reduce performance when playing games where you can go from 60fps to 30fps.
Assignee | ||
Comment 25•2 years ago
|
||
Anyway, we figured a possible workaround, I'm going to submit it to WebRTC and we can pick it from there.
Updated•2 years ago
|
Comment 26•2 years ago
|
||
(In reply to grulja from comment #24)
Using DmaBuf makes a big difference on the performance. On the compositor side the whole desktop is less sluggish without needing to download everything every frame, which can for example reduce performance when playing games where you can go from 60fps to 30fps.
I mean how particularly will be performance improved when dmabuf is used with WebRTC? Looking at ImageFromDmaBuf():
you just download video pixels to user space without any previous utilization on GPU side. I don't see any benefit here, the GPU->CPU pixel transfer is just moved from Mutter to Firefox but result is the same.
Assignee | ||
Comment 27•2 years ago
|
||
The difference is mainly on the compositor side (Mutter or KWin) and Jonas stated it's significant as you avoid copying every frame into the shared buffer, instead you just pass it as a fd so you don't copy anything there.
Comment 28•2 years ago
|
||
(In reply to grulja from comment #27)
The difference is mainly on the compositor side (Mutter or KWin) and Jonas stated it's significant as you avoid copying every frame into the shared buffer, instead you just pass it as a fd so you don't copy anything there.
Yes, they clearly throw the dirty work to clients and claims that "feature". Because you need to do that anyway if you want to access GPU data by CPU.
Assignee | ||
Comment 29•2 years ago
|
||
I submitted the fix to WebRTC here: https://webrtc-review.googlesource.com/c/src/+/296420.
Is there a way to create a testing FF build? E.g. when I submit this as a FF change?
Assignee | ||
Comment 30•2 years ago
|
||
Because of a possible misconfiguration or a possible driver issue it
might happen that the browser will use a different driver on X11 and
end up using yet another one for wayland/gbm, which might lead to not
working screen sharing in the better case, but also to a crash in the
other driver (Nvidia). This adds a check for platform the browser runs
on, if it's XWayland or Wayland and based on that query EGL display for
that specific platform, rather than going for the Wayland one only.
Updated•2 years ago
|
Comment 31•2 years ago
|
||
(In reply to grulja from comment #29)
I submitted the fix to WebRTC here: https://webrtc-review.googlesource.com/c/src/+/296420.
Is there a way to create a testing FF build? E.g. when I submit this as a FF change?
No, unfortunately our build modifications are not really portable upstream. We are working towards eliminating our build changes where possible. I think the way forward there will be to remove the build_with_mozilla
flag, and decompose our needs into a set of feature flags that are more generally applicable. Still there are probably differences in build environment that will prevent upstream builds from ever being identical to ours.
grulja, if this lands upstream quickly we can cherry pick the patch into our current libwebrtc update. Do you have a need to land this here quickly, e.g. uplifting to beta?
I am adding Andreas Pehrsons (:pehrsons), as he is doing the current update.
Comment 32•2 years ago
|
||
(In reply to grulja from comment #29)
I submitted the fix to WebRTC here: https://webrtc-review.googlesource.com/c/src/+/296420.
Is there a way to create a testing FF build? E.g. when I submit this as a FF change?
If you mean testing as in manual testing, sure we can just apply it locally and push to try. You can pull the build off try through treeherder when it's done. Let me know if you want help doing this.
Please ping me if this lands upstream this week and I can cherry-pick.
Comment 33•2 years ago
|
||
I pushed a try build: https://treeherder.mozilla.org/jobs?repo=try&revision=7c1ca3b1f7972ace7ac44782c7502df61121480e
A B
job for linux64/opt will appear. When green, click it, select the artifacts tab and download target.tar.bz2
.
Extract this tarball (assuming to the current working dir) and you can run it through firefox/firefox
.
Comment 34•2 years ago
|
||
If you want this in 112 or even 111 (through beta uplift) you'll have to land to central first and we can replace it with a cherry-pick later. Probably best to have it approved upstream first in any case.
Comment 35•2 years ago
|
||
Catalin, could you try the repro steps again on the same machine with the build in comment 33?
Comment 36•2 years ago
|
||
(In reply to Nico Grunbaum [:ng, @chew:mozilla.org] from comment #31)
(In reply to grulja from comment #29)
I submitted the fix to WebRTC here: https://webrtc-review.googlesource.com/c/src/+/296420.
Is there a way to create a testing FF build? E.g. when I submit this as a FF change?
Sorry, I misinterpreted this to mean a build in Google's CI. If you don't have try access to run your own builds that can be easily fixed. See: https://www.mozilla.org/en-US/about/governance/policies/commit/access-policy/ (Level 1 - Try/User/Incubator Access). I'll provide a voucher if you need one.
Reporter | ||
Comment 37•2 years ago
|
||
Sure thing, just tried with the try build in Comment 33, but the crash is still present unfortunately.
Updated•2 years ago
|
Assignee | ||
Comment 38•2 years ago
|
||
(In reply to Catalin Sasca, QA [:csasca] from comment #37)
Sure thing, just tried with the try build in Comment 33, but the crash is still present unfortunately.
Does it end up with the same backtrace?
Assignee | ||
Comment 39•2 years ago
|
||
Is it possible to get a debug build that prints all environment variables set/used when Firefox is running?
Comment 40•2 years ago
|
||
Catalin, can you uninstall /usr/lib/libEGL_nvidia.so.0 file? Looks like a leftover from previous NVIDIA proprietary drivers installation.
Reporter | ||
Comment 41•2 years ago
|
||
Yes grulja, it was the same backtrace. After reading what Martin said, I did a complete purge of the Nvidia drivers and reinstalled cleanly the v525 metapackage driver (proprietary) and I can't reproduce the crash anymore it seems. (What I saw now is that the Nvidia X server settings has lesser options as before)
Updated•2 years ago
|
Assignee | ||
Comment 42•2 years ago
|
||
Happy it worked out. I wish we just tried that option before :). I assume this issue can be closed now.
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Comment 43•2 years ago
|
||
Note that the fix here is still correct. It's needed for Wayland/XWayland and multi GPU scenarios. The best solution may be to read DRM device from OpenGL and create headless (drm) GL context over it. We do the same approach for dmabuf/vaapi and dmabuf snapshots in Firefox.
To use correct display is minimum. Right now the pipewire code uses Wayland even if browser run on X11 (Xwayland) which is "interesting" :)
Assignee | ||
Comment 44•2 years ago
|
||
(In reply to Martin Stránský [:stransky] (ni? me) from comment #43)
Note that the fix here is still correct. It's needed for Wayland/XWayland and multi GPU scenarios. The best solution may be to read DRM device from OpenGL and create headless (drm) GL context over it. We do the same approach for dmabuf/vaapi and dmabuf snapshots in Firefox.
To use correct display is minimum. Right now the pipewire code uses Wayland even if browser run on X11 (Xwayland) which is "interesting" :)
I was told by Jonas that getting the display through EGL_PLATFORM_X11_KHR should give you the same one, as it goes through XWayland and Glamor, which gets it from Wayland. In the multi GPU scenario you should alway get the same EGL display for both Firefox and Wayland/gbm used in the screen sharing code.
I will revisit this once I find some time and check how to make it less error-prone.
Comment 45•2 years ago
•
|
||
(In reply to grulja from comment #44)
(In reply to Martin Stránský [:stransky] (ni? me) from comment #43)
Note that the fix here is still correct. It's needed for Wayland/XWayland and multi GPU scenarios. The best solution may be to read DRM device from OpenGL and create headless (drm) GL context over it. We do the same approach for dmabuf/vaapi and dmabuf snapshots in Firefox.
To use correct display is minimum. Right now the pipewire code uses Wayland even if browser run on X11 (Xwayland) which is "interesting" :)
I was told by Jonas that getting the display through EGL_PLATFORM_X11_KHR should give you the same one, as it goes through XWayland and Glamor, which gets it from Wayland. In the multi GPU scenario you should alway get the same EGL display for both Firefox and Wayland/gbm used in the screen sharing code.
Please test that by yourself.
- load Firefox in debugger as:
MOZ_ENABLE_WAYLAND=0 gdb ./firefox
that configures Firefox to run on XWayland even when Wayland is available (that's default right now for release).
- put breakpoint to wl_display_get_registry
- run firefox and open screen sharing code
- See the backtrace:
#0 wl_display_get_registry (wl_display=0x7fffb640ae70) at /usr/include/wayland-client-protocol.h:1064
#1 dri2_initialize_wayland_drm (disp=0x7fffb7321600) at ../src/egl/drivers/dri2/platform_wayland.c:2118
#2 dri2_initialize_wayland (disp=disp@entry=0x7fffb7321600) at ../src/egl/drivers/dri2/platform_wayland.c:2777
#3 0x00007fffc9e28c40 in dri2_initialize (disp=disp@entry=0x7fffb7321600) at ../src/egl/drivers/dri2/egl_dri2.c:1205
#4 0x00007fffc9e1a4a1 in eglInitialize (dpy=<optimized out>, major=0x7fffffffb53c, minor=0x7fffffffb538) at ../src/egl/main/eglapi.c:703
#5 0x00007ffff00ac937 in webrtc::EglDmaBuf::EglDmaBuf() (this=0x7fffb5f09b80) at /raid/src4/third_party/libwebrtc/modules/desktop_capture/linux/wayland/egl_dmabuf.cc:375
#6 0x00007ffff00b3b29 in std::make_unique<webrtc::EglDmaBuf>() ()
at /home/komat/.mozbuild/sysroot-x86_64-linux-gnu/usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/unique_ptr.h:821
#7 webrtc::SharedScreenCastStreamPrivate::StartScreenCastStream(unsigned int, int, unsigned int, unsigned int) (this=0x7fffc2c72200, stream_node_id=101, fd=186, width=0, height=0)
at /raid/src4/third_party/libwebrtc/modules/desktop_capture/linux/wayland/shared_screencast_stream.cc:395
#8 0x00007ffff00aa22b in webrtc::BaseCapturerPipeWire::OnScreenCastRequestResult(webrtc::xdg_portal::RequestResponse, unsigned int, int)
(this=0x7fffba6bbc80, result=webrtc::xdg_portal::RequestResponse::kSuccess, stream_node_id=0, fd=0)
at /raid/src4/third_party/libwebrtc/modules/desktop_capture/linux/wayland/base_capturer_pipewire.cc:60
#9 0x00007ffff00af82c in webrtc::ScreenCastPortal::OnPortalDone(webrtc::xdg_portal::RequestResponse) (this=0x7fffb720d010, result=webrtc::xdg_portal::RequestResponse::kUnknown)
at /raid/src4/third_party/libwebrtc/modules/desktop_capture/linux/wayland/screencast_portal.cc:135
#10 0x00007ffff00b0c5c in webrtc::ScreenCastPortal::OnOpenPipeWireRemoteRequested(_GDBusProxy*, _GAsyncResult*, void*)
(proxy=<optimized out>, result=<optimized out>, user_data=0x7fffb720d010) at /raid/src4/third_party/libwebrtc/rtc_base/logging.h:406
So perhaps somewhere deeply is used the same device but we're getting config through dri2_initialize_wayland() and we're configure dmabuf via zwp_linux_dmabuf_v1_get_version() (if that extension is available). See egl/drivers/dri2/platform_wayland.c in Mesa for details.
Comment 46•2 years ago
|
||
Also note that there's nsIGfxInfo::FEATURE_DMABUF that we disable on broken NVIDIA drivers (latest one is Bug 1820055).
Without such check expect browser crashes. So nsIGfxInfo::FEATURE_DMABUF may be checked before we enable it for screensharing.
Comment 47•2 years ago
|
||
As a workaround I propose to disable dmabuf screensharing for recent dev cycle (as we're in soft freeze now) and enable it when it's fixed.
We may backport that to beta too.
Assignee | ||
Comment 48•2 years ago
|
||
(In reply to Martin Stránský [:stransky] (ni? me) from comment #46)
Also note that there's nsIGfxInfo::FEATURE_DMABUF that we disable on broken NVIDIA drivers (latest one is Bug 1820055).
Without such check expect browser crashes. So nsIGfxInfo::FEATURE_DMABUF may be checked before we enable it for screensharing.
That would effectively disable DmaBuf support for screen sharing for all released Nvidia drivers, without having any indications that there are issues because of it. Please don't do so. We have been testing Nvidia drivers with DmaBufs during the development of this and I got many users using it without any issue in Chrome.
Comment 49•2 years ago
|
||
See the crash stats: https://crash-stats.mozilla.org/signature/?product=Firefox&signature=NvGlEglGetFunctions
Dmabuf bug in NVIDIA drivers was confirmed by NVIDIA developer: https://bugzilla.mozilla.org/show_bug.cgi?id=1788573#c26
Assignee | ||
Comment 50•2 years ago
|
||
(In reply to Martin Stránský [:stransky] (ni? me) from comment #49)
See the crash stats: https://crash-stats.mozilla.org/signature/?product=Firefox&signature=NvGlEglGetFunctions
Dmabuf bug in NVIDIA drivers was confirmed by NVIDIA developer: https://bugzilla.mozilla.org/show_bug.cgi?id=1788573#c26
Yes, but it's not related to DmaBuf screen sharing.
Assignee | ||
Comment 51•2 years ago
|
||
And also back the "getting EGL display" thing. I believe we need to process GBM buffers we get from the compositor with the same device/driver as they were produced, therefore I believe the way we get the EGL display is correct. I don't think it matters that match whether Firefox runs as an XWayland app.
Comment 52•2 years ago
|
||
(In reply to grulja from comment #50)
Yes, but it's not related to DmaBuf screen sharing.
Why do you think so? I see the crash comes from eglDestroyImage() when EGLImage is created over dmabuf and it's use-after-free bug in NIVIDIA drivers. Both dmabuf and EGLImages are used for screensharing ... so I don't understand your argument.
Comment 53•2 years ago
|
||
(In reply to grulja from comment #51)
And also back the "getting EGL display" thing. I believe we need to process GBM buffers we get from the compositor with the same device/driver as they were produced, therefore I believe the way we get the EGL display is correct. I don't think it matters that match whether Firefox runs as an XWayland app.
Well, whole browser runs as X11 application but you're blindly opening Wayland connection to compositor from WebRTC just to download dmabuf memory to CPU. I'd expect such behavior for tutorial maybe but not for real application.
Assignee | ||
Comment 54•2 years ago
|
||
(In reply to Martin Stránský [:stransky] (ni? me) from comment #53)
Well, whole browser runs as X11 application but you're blindly opening Wayland connection to compositor from WebRTC just to download dmabuf memory to CPU. I'd expect such behavior for tutorial maybe but not for real application.
This code runs only on Wayland, it's not going to be used on X11. We also want to use the same EGL display as the compositor who sends us GBM buffers, therefore using EGL_PLATFORM_WAYLAND_KHR, which get's it from the default wl_display seem correct to me. Anyway, I'm not saing this is 100% error prone solution, but it is what it is given my limited knowledge. You are always free to improve it and submit a patch.
(In reply to Martin Stránský [:stransky] (ni? me) from comment #52)
(In reply to grulja from comment #50)
Yes, but it's not related to DmaBuf screen sharing.
Why do you think so? I see the crash comes from eglDestroyImage() when EGLImage is created over dmabuf and it's use-after-free bug in NIVIDIA drivers. Both dmabuf and EGLImages are used for screensharing ... so I don't understand your argument.
The difference here might be that FF uses GFX, while here we use EGL? Again, I'm not really knowledgeable in this field, but I'm 100% sure if there was a bug in Nvidia driver, we would have seen this issue in Chrome already.
Updated•2 years ago
|
Comment 55•2 years ago
|
||
Let's land the correct display type fix at least.
Updated•2 years ago
|
Updated•2 years ago
|
Comment 56•2 years ago
|
||
Comment 57•2 years ago
|
||
Backed out for causing Bug 1821388 as requested here
Comment 58•2 years ago
|
||
bugherder |
Updated•2 years ago
|
Comment 59•2 years ago
|
||
Backout merged to central: https://hg.mozilla.org/mozilla-central/rev/9d24cb6b9856
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Comment 61•2 years ago
|
||
Comment 62•2 years ago
|
||
(In reply to grulja from comment #54)
but I'm 100% sure if there was a bug in Nvidia driver, we would have seen this issue in Chrome already.
(I'm just a user/tester.)
IIUC, Chrome doesn't use EGL on X11 (it uses EGL via ANGLE via GLX), few Nvidia users use Wayland, few users have WebGL running while suspend&resume. Crash rate spiked when accelerated Canvas2D (=Canvas2D translated to WebGL) has been rolled out, that's when we took it seriously.
Assignee | ||
Comment 63•2 years ago
|
||
(In reply to Darkspirit from comment #62)
(In reply to grulja from comment #54)
but I'm 100% sure if there was a bug in Nvidia driver, we would have seen this issue in Chrome already.
(I'm just a user/tester.)
IIUC, Chrome doesn't use EGL on X11 (it uses EGL via ANGLE via GLX), few Nvidia users use Wayland, few users have WebGL running while suspend&resume. Crash rate spiked when accelerated Canvas2D (=Canvas2D translated to WebGL) has been rolled out, that's when we took it seriously.
I meant specifically the screen sharing code in WebRTC, which have been tested many times also on Nvidia drivers.
Comment 64•2 years ago
•
|
||
(In reply to grulja from comment #63)
I meant specifically the screen sharing code in WebRTC, which have been tested many times also on Nvidia drivers.
If you have an unusually perfect configured Nvidia driver (by enabling the Nvidia suspend service and NVreg_PreserveVideoMemoryAllocations=1), then EGL is not affected by bug 1788573 comment 26.
If any dmabuf/EGLImage usage ends on suspend (shortly before) in a way that the app - after resume - doesn't let the driver try to access something that is gone, then it might not affected by bug 1788573 comment 26, but I don't know yet.
Later, I will test screen sharing a bit, I think the most relevant aspect is when a preview of my screen is shown to me.
At the moment, applying the dmabuf blocklist seems reasonable because why would anyone want the risk of a confirmed use-after-free EGL driver bug?
My naive questions to myself are:
If I open https://www.webrtc-experiment.com/Pluginfree-Screen-Sharing/ in Chromium and Firefox, there is a preview of my screen:
- Is dmabuf screen sharing (actually used and)
a) copied and shown like an image/canvas? That might be unaffected, but not performant.
b) directly shown as if it would be dmabuf webgl? That would likely run into the problem.
c) copied, software encoded, and shown as decoded video? No one would do that. - How does Chrome import a dmabuf for the preview if it actually uses GLX? Does it directly use EGL on ChromeOS (Wayland), but EGL via ANGLE via GLX on Nvidia? (Or does Chrome directly use EGL on X11 for hardware rendering - since when?)
Comment 65•2 years ago
|
||
(In reply to Darkspirit from comment #64)
My naive questions to myself are:
If I open https://www.webrtc-experiment.com/Pluginfree-Screen-Sharing/ in Chromium and Firefox, there is a preview of my screen:
- Is dmabuf screen sharing (actually used and)
a) copied and shown like an image/canvas? That might be unaffected, but not performant.
b) directly shown as if it would be dmabuf webgl? That would likely run into the problem.
c) copied, software encoded, and shown as decoded video? No one would do that.
The backend seems to unconditionally download it to a cpu buffer, then we convert it to I420 shortly after, and that's what ends up (some copies later) in the compositor.
Comment 66•2 years ago
|
||
bugherder |
Assignee | ||
Comment 67•2 years ago
|
||
(In reply to Andreas Pehrson [:pehrsons] from comment #65)
(In reply to Darkspirit from comment #64)
My naive questions to myself are:
If I open https://www.webrtc-experiment.com/Pluginfree-Screen-Sharing/ in Chromium and Firefox, there is a preview of my screen:
- Is dmabuf screen sharing (actually used and)
a) copied and shown like an image/canvas? That might be unaffected, but not performant.
b) directly shown as if it would be dmabuf webgl? That would likely run into the problem.
c) copied, software encoded, and shown as decoded video? No one would do that.The backend seems to unconditionally download it to a cpu buffer, then we convert it to I420 shortly after, and that's what ends up (some copies later) in the compositor.
Well, unconditionally. There is a whole negotiation process between us and the wayland compositor, both sides need to actually announce DMAbuf support and agree on it. I assume the compositor does know when it can be used and on our side we check whether the driver has all the needed extensions. This is only happening on Wayland session and PipeWire is therefore not used on X11 sessions.
Assignee | ||
Comment 68•2 years ago
|
||
Also, in case we fail to import a DMAbuf, whole stream will be renegotiated and a different kind of buffers (e.g. MemFD) will be used instead.
Comment 69•2 years ago
•
|
||
If Dmabuf screen sharing isn't used to achieve zero copy somewhere, then it has no performance benefit over MemFD screen sharing, right?
In my understanding there would be no point to use the dmabuf code path unless it is wired up to be used for zero-copy presentation in egl hardware rendering (like egl dmabuf webgl/vaapi are used).
That's how I understand comment 26 + comment 65.
comment 43:
On Wayland you can be running as Xwayland or as Wayland and if you use EGL without display connection, you might end up using the wrong gpu or wrong egl implementation:
bug 1769499: Firefox ran as native Wayland app on Nvidia (MOZ_ENABLE_WAYLAND=1 env var, enabled by default on Firefox Nightly)
- used headless egl (it was actually Nvidia's X11 EGL implementation) in RDD process to make EGLImage from vaExportSurfaceHandle
- tried to use X11-EGL EGLImage in Wayland-EGL hardware rendering and failed.
- Workaround was running Firefox with EGL_PLATFORM=wayland nvidia driver environment variable.
bug 1773377: Firefox ran as native Wayland app on Nvidia while also having an Intel GPU
- used egl surfaceless platform (which defaults to renderD128: Intel in this case) in RDD process to make EGLImage from vaExportSurfaceHandle
- tried to use EGLImage (from Intel GPU) in Wayland-EGL hardware rendering on Nvidia GPU and failed.
- Fixed by using egl device platform with correct device in RDD process (/dev/dri/renderD129 can be obtained from EGL_DRM_DEVICE_FILE_EXT from EGL context with display connection and be compared with EGL_DRM_RENDER_NODE_FILE_EXT from enumerated devices)
Overview of combinations supported by Firefox:
platform | WebRender type | WebGL type | hw decoding¹ | for which drivers² |
---|---|---|---|---|
wayland (enabled on nightly) | egl hardware rendering | egl dmabuf webgl | egl dmabuf vaapi | mesa >= 17, nvidia > 530.30.02 |
wayland (enabled on nightly) | egl hardware rendering | egl webgl (copy) | - | nvidia >= 470.82 |
wayland (enabled on nightly) | software rendering | egl webgl (copy) | - | mesa < 17, nvidia < 470.82 |
x11 & xwayland | egl hardware rendering | egl dmabuf webgl | egl dmabuf vaapi | mesa >= 17 (since bug 1818992), nvidia > 530.30.02 |
x11 & xwayland | egl hardware rendering | egl webgl (copy) | - | nvidia >= 470.82 |
x11 & xwayland | glx hardware rendering | glx webgl (copy) | - | - |
x11 & xwayland | software rendering | egl webgl (copy) | - | - |
x11 & xwayland | software rendering | glx webgl (copy) | - | mesa < 17, nvidia < 470.82 |
¹) based on vaExportSurfaceHandle: Enabled on nightly with mesa>=21, can be manually enabled for older mesa or nvidia-vaapi-driver.
²) nouveau, xwayland and arm64 have slightly different rules
Comment 70•2 years ago
|
||
(In reply to grulja from comment #67)
Well, unconditionally. There is a whole negotiation process between us and the wayland compositor, both sides need to actually announce DMAbuf support and agree on it. I assume the compositor does know when it can be used and on our side we check whether the driver has all the needed extensions. This is only happening on Wayland session and PipeWire is therefore not used on X11 sessions.
My point being that there's no path for that dmabuf to propagate all the way to the (gecko) compositor for rendering.
Assignee | ||
Comment 71•2 years ago
|
||
There is zero copy on the compositor side, which has a huge impact on the overall performance.
We get EGL display from EglGetPlatformDisplay() using EGL_PLATFORM_WAYLAND_KHR, which according to the documentation gets it from the default wl_display. This has worked so far without any issue, but I'm not saying there is no room for improvement and extending this with more checks to be sure we use correct one would definitely be better. I definitely need to learn more about this stuff.
Comment 72•2 years ago
•
|
||
I did testing on my i7 laptop with gnome-shell / Fedora 37 / Wayland / latest nightly and on the current setup dmabuf sharing uses more CPU than non-dmabuf one. See the screens below.
Comment 73•2 years ago
|
||
DMAbuf screen sharing disabled
Comment 74•2 years ago
|
||
DMAbuf screen sharing enabled
Comment 75•2 years ago
•
|
||
So on the recent config DMABuf is less effective than the former one. Note that the pictures measure CPU usage of whole system, i.e compositor + Firefox. From detailed CPU usage with Dmabuf Firefox uses 50-60% cpu and without it it uses 30-40%.
The BMABuf may have one advantage that compositor may handle us raw GPU buffer and we can drop them if nobody needs them without any processing (Bug 1820971). From my testing that may reduces screensharing CPU usage by 50%.
Comment 76•2 years ago
|
||
There's a screenshot from Google Chome on the same box. It looks like Chrome uses some heuristics and don't process frames without change. The high CPU usage parts are when mouse is moved and low is when mouse cursor is still.
If Google uses DMABuf sharing (as Jan claims) it seems to be hit the GPU->CPU bottleneck as well as Firefox with DMABuf enabled.
Assignee | ||
Comment 77•2 years ago
|
||
I'm seeing about equal CPU usage in both cases in Chromium, which is obviously wrong and need to be fixed. Still, using MemFD buffers the compositor has to process all of them, we can based on the Martin's change only use those we will really need. This makes a difference when DMABufs are used, because the compositor doesn't need to process any and we again only those we will really use.
I think I will need to revisit https://webrtc-review.googlesource.com/c/src/+/270620, this is what makes DMABufs slower on the WebRTC side.
Assignee | ||
Comment 78•2 years ago
|
||
(In reply to grulja from comment #77)
I'm seeing about equal CPU usage in both cases in Chromium, which is obviously wrong and need to be fixed. Still, using MemFD buffers the compositor has to process all of them, we can based on the Martin's change only use those we will really need. This makes a difference when DMABufs are used, because the compositor doesn't need to process any and we again only those we will really use.
I think I will need to revisit https://webrtc-review.googlesource.com/c/src/+/270620, this is what makes DMABufs slower on the WebRTC side.
This https://webrtc-review.googlesource.com/c/src/+/297501 should fix it. In my testing using DMABufs is now less CPU consuming than using shared memory buffers.
Updated•2 years ago
|
Description
•