Closed Bug 1699864 Opened 3 years ago Closed 3 years ago

[regression since Nightly 88 2021-03-18-21-35-31] MOZ_X11_EGL=1 causes critical content+chrome graphical glitching on X11

Categories

(Core :: Graphics: WebRender, defect)

Firefox 88
All
Linux
defect

Tracking

()

RESOLVED FIXED
89 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox86 --- unaffected
firefox87 --- unaffected
firefox88 --- fixed
firefox89 --- fixed

People

(Reporter: ronan.jouchet, Assigned: rmader)

References

(Regression)

Details

(Keywords: nightly-community, regression)

Attachments

(4 files, 1 obsolete file)

Steps to reproduce

  1. On Firefox Nightly >= 88.0a1 2021-03-18-21-35-31 with a fresh profile,
  2. Start Firefox with environment variable MOZ_X11_EGL=1 (which I use to enable Hardware Video Acceleration)

Expected behavior

Firefox works (and with further config, videos are hardware accelerated, but that's not the point of this bug)

Actual behavior

The whole window (content & chrome) is horribly glitched; see attached GIF.

Environment

  • Arch Linux, up-to-date
  • Regression window:
    • Last GOOD build: 2021-03-18-09-31-09-mozilla-central
    • First BAD build: 2021-03-18-21-35-31-mozilla-central
  • Official Firefox 88 build from https://ftp.mozilla.org/pub/firefox/nightly/latest-mozilla-central/ (no AUR)
  • lsgpu says: 00:02.0 VGA compatible controller: Intel Corporation Skylake GT2 [HD Graphics 520] (rev 07); Subsystem: Lenovo Device 2231; Kernel driver in use: i915
  • Feel free to ask for extra debug info.
Flags: needinfo?(stransky)

Forgot to mention in Environment: yes I'm using xorg and not wayland.

Indeed, this does not happen on Wayland, while it does on X11. I'm on Ubuntu Groovy (not one of the flavor) by the way.

This seems to be a regression from bug 1684194, since the timestamp of the commit lines up perfectly.

Attached file glxinfo.txt

This seems to be a regression from bug 1684194, since the timestamp of the commit lines up perfectly.

Thanks. In that case, here's my glxinfo, in case that's useful.

I can confirm this on intel skylake. So this appears to happen only with HWWR (makes sense as SWWR does not use the EGL compositor).

Interestingly the top-left and top-right corners appear not to be affected. These are the areas that are not part of the opaque region[1] - I'll try to have a look into this.
1: https://searchfox.org/mozilla-central/source/widget/gtk/nsWindow.cpp#5673-5678

Component: General → Graphics: WebRender
Product: Firefox → Core
Regressed by: 1684194
Has Regression Range: --- → yes

I seemed to have filed another bug for this: bug 1700030...

I'll look at ti after 87.0 release. I can reproduce that on Intel 630 too.

So what appears to happen is that we render semi-transparent - something like 50% opacity or so. Removing the opaque region makes the content render "correct" again - just sime-transparent. Setting the opaque region makes the compositor skip the clearing state, making the pixel values quickly aggregate to their max values - everything becomes solid white. Not sure yet why that is.

The issue is that we use a 24 bit color depth while we need to use 32. This is because the shared GL context is created without a compositor widget, falling back to CreateConfigScreen() which uses gfxVars::ScreenDepth().

For the OGL compositor I added some code in CreateForCompositorWidget() which gets the proper depth from the widget - this trick won't work here. I suppose the best fix would be to make gfxVars::ScreenDepth() return the right value.

The screen color depth is used as default depth for GL contexts.
On X11 it was reported alpha-less - likely a relict from times when
this was actually performance relevant (it is not anymore these days).

This results in shared GL context getting created with 24bit color
depth, which again results in the window being semi-transparent.

Explicitely request the color depth including alpha - this is what
we already get on Wayland and what we need on X11 as well, especially
in a Webrender-only future.

Assignee: nobody → robert.mader

I rebuilt 88b1 with that patch and it fixes both the black background and translucent window issues. Thanks!

Wow! That was fast. We configure the visual at nsWindow::Create() so perhaps we should use correct visual info from it, I'm not sure if gdk_screen_get_rgba_visual() is always valid.

Ah I see - so IIUC what we actually would want here is choosing a visual via EGL, just as we do in nsWindow::ConfigureX11GLVisual. However, GLContextEGL::FindVisual is currently broken[1] and we'd need to use the GLX version (GLContextGLX::FindVisual). This may be a chance to also fix bug 1667621 - at least make it not break where GLX doesn't ( bug 1667621 comment 3 ). I'll check if I can come up with something - alternatively we can hardcode 32 bit as suggested by Martin[2] or add a fallback to gdk_screen_get_system_visual in for cases where gdk_screen_get_rgba_visual returns NULL - i.e. all systems that don't offer ARGB8888, which is the only implemented value in GTK[3].

1: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2376
2: https://phabricator.services.mozilla.com/D109521#3563574
3: https://gitlab.gnome.org/GNOME/gtk/-/blob/master/gdk/x11/gdkvisual-x11.c#L184-186

See Also: → 1667621
Status: UNCONFIRMED → NEW
Ever confirmed: true

Just noticed: Since ~2021-03-19 17% of the Nightly Linux user base has swiched from WebRender to Basic. (gfxCompositor nightly linux)

Since D108508 the X11/EGL backend creates a shared GL context via
CreateGLContextEGL() which chains up to CreateForCompositorWidget()
with a nullptr widget. With the OGL compositor we relied on the
widget giving us a valid color depth, while now we'd fall back to
gfxVars::ScreenDepth().

On X11 color depth is defined as:

depth means the number of bits in a pixel that are actually used
to determine the pixel color

i.e. we on a usual system we would get 24bit.

As we require an alpha channel when using WR, the result would be
disappointing. Thus hardcode 32bit color depth for X11/EGL when
creating contexts without widget for now.

Attachment #9211065 - Attachment is obsolete: true
Attachment #9211515 - Attachment description: Bug 1699864 - Request 32bit color depth on X11/EGL by default, r=stransky → Bug 1699864 - Request 32bit color depth on Linux/EGL by default, r=stransky
Pushed by jgilbert@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ad4e73130915
Request 32bit color depth on Linux/EGL by default, r=stransky,jgilbert

I'm not sure if this is relevant information at this point, but this does not go away on X11 for me when I set MOZ_X11_EGL=0 in a terminal, whether before the command — MOZ_X11_EGL=0 firefox-nightly, or by exporting — export MOZ_X11_EGL=0. Presumably, that disables EGL on X11, but I can't check about:support to see if it actually did the trick, since everything becomes pure white in a hurry, so…

I'm sorry in advance if I'm being a bother in any way 😅.

Sandi:

When this happened, I ended up having to killall firefox before restarting with MOZ_X11_EGL=0 had an effect (apparently Firefox, having triggered this bug, entered some state where it doesn't exit gracefully).

(In reply to Dmitry Gutov from comment #21)

Sandi:

When this happened, I ended up having to killall firefox before restarting with MOZ_X11_EGL=0 had an effect (apparently Firefox, having triggered this bug, entered some state where it doesn't exit gracefully).

I literally just logged into X11 to check if another bug was reproducible in the GNOME X11 session (IBus stuff), so there were no previous Firefox instances to speak of. The only thing that I can think of is Thunderbird Daily, since I set it to open automatically at logon, but I wouldn't imagine (or at least wouldn't hope) that it could affect Firefox like this. I'll try killing all Thunderbird instances next time I try this then.

Thanks for the tip!

https://www.reddit.com/r/firefox/comments/m8tpqd/nightly_linux_nvidia_proprietary_driver_egl_no/

Nightly + Linux + Nvidia proprietary driver + EGL no longer works as of 20210319
it will fail to initialize and fall back to software rendering

Tested with Debian Testing/Gnome X11/GTX1060:

  • Mesa/Nouveau looks like comment 0 (HW WR) and is fixed by comment 19
  • Proprietary Nvidia stills fails with EGL_BAD_MATCH and falls back to software rendering (SW WR). It worked before bug 1684194.
Summary: [regression since Nightly 88 2021-03-18-21-35-31] MOZ_X11_EGL=1 causes critical content+chrome graphical glitching on Intel GPU → [regression since Nightly 88 2021-03-18-21-35-31] MOZ_X11_EGL=1 causes critical content+chrome graphical glitching on X11
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 89 Branch

Will this fix also be applied to Firefox 88?

Fix confirmed in 89.0a1 (2021-03-26). Thanks everybody!

Comment on attachment 9211515 [details]
Bug 1699864 - Request 32bit color depth on Linux/EGL by default, r=stransky

Beta/Release Uplift Approval Request

  • User impact if declined: Wrong colors in WebRender when EGL/X11 is enabled.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Fallback to 32-bit color depth when we're missing the color info.
  • String changes made/needed:
Flags: needinfo?(stransky)
Attachment #9211515 - Flags: approval-mozilla-beta?

Comment on attachment 9211515 [details]
Bug 1699864 - Request 32bit color depth on Linux/EGL by default, r=stransky

Approved for 88.0b4.

Attachment #9211515 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Awesome, thanks!

Regressions: 1701863
See Also: → 1735939
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: