Closed Bug 1683341 Opened 5 years ago Closed 4 years ago

Top half of the browser has a solid color when using WebRender/GLX/Gnome X11 with GTK_CSD=1/proprietary Nvidia

Categories

(Core :: Widget: Gtk, defect)

Firefox 82
x86_64
Linux
defect

Tracking

()

RESOLVED DUPLICATE of bug 1696905
Tracking Status
firefox84 --- disabled
firefox85 --- disabled
firefox86 --- disabled
firefox89 + wontfix
firefox90 --- fixed
firefox91 --- fixed

People

(Reporter: jan, Unassigned)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: correctness, nightly-community, regression)

Attachments

(8 files)

+++ This bug was initially created as a clone of Bug #1663273 +++

See bug 1663273 comment 80 + comment 81.

I don't know if this is the right place, but I don't want to reopen the other closed tickets.

I was using the beta, and it auto-updated to 87b4 (?). I don't remember what was the previous version (i didn't update for some time).

This white block always happened to me whenever I tried to enable webrender, and when I updated, it happened again. I actually had to fall back to stable, since the "disable Title Bar" feature stopped working as well (as in: the tabs are not rendered in place of the title bar)

I tried the Nightly 88.0a1 (2021-03-01) (64-bit) on Elementary OS, RTX 3070, NVidia driver 460.39 and the same bugs happened: White top block, and title bar stopped working.

I'm attaching some files to show it.

Is the GTK_CSD=1 env variable significant here? Do you see it when it's not set?

Flags: needinfo?(ricardopieper)

Okay, I tested in multiple situations:
TL;DR: GTK_CSD=0 solves both bugs: the white box, and the disable title bar feature.

The printenv command shows that GTK_CSD is already set to 1 by default, and MOZ_X11_EGL is also 1.

When I just click on the firefox executable on the file manager, the bug happens, compositor is WebRender.
If I just run ./firefox on the command line, it falls back to Webrender (software).
In the stdout, I see errors like:

[GFX1-]: Failed to create EGLSurface!: 0x3009
[GFX1-]: Failed to create EGLSurface!: 0x3009
[GFX1-]: Failed GL context creation for WebRender: 0
[GFX1-]: FEATURE_FAILURE_WEBRENDER_INITIALIZE_UNSPECIFIED
[GFX1-]: Failed to connect WebRenderBridgeChild.
[GFX1-]: Fallback WR to SW-WR

If I run GTK_CSD=1 ./firefox, nothing changes, of course. Still runs with Webrender (software).
If I run GTK_CSD=0 ./firefox, then it runs with hardware Webrender, and the white box disappears.

I also tried to remove the MOZ_X11_EGL environment variable from my bashrc, these are the results:

If I just run ./firefox on the command line, the bug happens, compositor is Webrender. (It's just like clicking on the executable)
Just to sanity check, I run MOZ_X11_EGL=1 ./firefox and then it uses software webrender.
If I run GTK_CSD=0 ./firefox, no white box, harware webrender... works fine.
If I run MOZ_X11_EGL=1 GTK_CSD=0 ./firefox it also works fine.

Flags: needinfo?(ricardopieper)

So perhaps I didn't read the title of this bug and should have tested it before? I'm sorry.. though it seems like the bug still happens tho

I wonder why you have set GTK_CSD and MOZ_X11_EGL. Did you set it by yourself or does that come from distro?

GTK_CSD is obsoleted and we should not use that in Firefox directly - I'll a check from
https://searchfox.org/mozilla-central/rev/f83c67b24fed1d677c5deafe7b31f5656c2656ec/widget/gtk/nsWindow.cpp#8175
(Bug 1209659 may be related)

As for MOZ_X11_EGL, this is experimental feature and it's not finished yet (Bug 1677203).
Thanks.

Flags: needinfo?(ricardopieper)

I do remember messing around with MOZ_X11_EGL, I don't remember exactly why. I entered a rabbit hole of confusion when trying to enable hardware video acceleration, I messed around with a ton of stuff. Eventually I reached MOZ_X11_EGL and tried to do something with it.

As for the GTK_CSD, it's being set by /etc/profile.d/gtk_csd.sh, which I don't ever remember messing with. I think it is a distro setting?

I don't know what GTK_CSD and MOZ_X11_EGL really mean and what are the consequences of enabling or disabling them, but I'll try disabling CSD and removing the EGL variable in bashrc.

Flags: needinfo?(ricardopieper)

Though 'll leave a quick comment here: assuming that GTK_CSD is in fact being set by the distro, I think the upgrade to 87 beta4 did break things for those who have the same distro, other than webrender. I'm not sure that the EGL thing broke anything.

I did not understand what you mean in your comment about that line of code, if it will be removed, modified, or if the CSD just triggers that check.

The titlebar and borders are fixed at Bug 1693460.

Can you check if the WebRender/GLX bug is a regression or not? Please use mozregression tool for it:
https://fedoraproject.org/wiki/How_to_debug_Firefox_problems?rd=Bug_info_Firefox#Use_Mozregression_tool

Thanks.

Flags: needinfo?(ricardopieper)

Hi Martin,

I tried several configurations:

This is my environment:
echo $MOZ_X11_EGL: Variable is unset
echo $GTK_CSD: value is 0

Regressions with mozregression --good 86 --bad 87:
None

Regressions with MOZ_X11_EGL=0 mozregression --good 86 --bad 87
None

Regressions with MOZ_X11_EGL=1 mozregression --good 86 --bad 87
None

Regression with GTK_CSD=1 mozregression --good 86 --bad 87
This test just changes the GTK_CSD variable, leaving MOZ_X11_EGL unset. Title bar and White block observed.
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=8471b70b4df960d3599dcd951f0b05fb4f7bd420&tochange=12744d62ec8944fe64bb028a68bcab2c4665cf7b

Regression with GTK_CSD=1 MOZ_X11_EGL=0 mozregression --good 86 --bad 87
This one only produces the title bar bug
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=3d4360e021fa62a3dfc40df2295038622f7cfa96&tochange=d6388772d4c63331ad4dfebdbaa945364dada2e1

Regression with GTK_CSD=1 MOZ_X11_EGL=1 mozregression --good 86 --bad 87
Exact same results as the above. Seems like the code only checks if MOZ_X11_EGL is set, whether it's 1 or 0 doesn't matter?
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=3d4360e021fa62a3dfc40df2295038622f7cfa96&tochange=d6388772d4c63331ad4dfebdbaa945364dada2e1

With GTK_CSD=0 there are no regressions. Is GLX the thing being replaced by EGL?

Flags: needinfo?(ricardopieper)

(In reply to ricardopieper from comment #11)

Thanks. Let's wait until Bug 1693460 hits the nightly builds.

With GTK_CSD=0 there are no regressions. Is GLX the thing being replaced by EGL?

Not yet, there's a plan to use it for Mesa drivers only.

So can you please re-test with GTK_CSD=1?
Thanks.

Flags: needinfo?(ricardopieper)

Hi, sorry for the delay.

GTK_CSD=1 still results in the top half of the browser having a solid color. I downloaded the latest nightly.

Flags: needinfo?(ricardopieper)

Okay. And is the titlebar bug fixed at least?

Flags: needinfo?(ricardopieper)

The hide title bar feature seems to be working fine though.

Flags: needinfo?(ricardopieper)

Maybe this is irrelevant information, but the MOZ_X11_EGL=1 flag now crashes the browser.
Should I report this elsewhere?

GFX1-: Failed to connect WebRenderBridgeChild.
GFX1-: Failed to create EGLSurface!: 0x3009
GFX1-: Failed to create EGLSurface!: 0x3009
GFX1-: Failed GL context creation for WebRender: 0

(In reply to ricardopieper from comment #17)

Maybe this is irrelevant information, but the MOZ_X11_EGL=1 flag now crashes the browser.
Should I report this elsewhere?

GFX1-: Failed to connect WebRenderBridgeChild.
GFX1-: Failed to create EGLSurface!: 0x3009
GFX1-: Failed to create EGLSurface!: 0x3009
GFX1-: Failed GL context creation for WebRender: 0

EGL is not supposed to run with NVIDIA drivers, we don't enable EGL there so don't use MOZ_X11_EGL.

Let me summarize it please.

So IIRC you get a white square when running latest nightly with GTK_CSD=1 and WebRender enabled, right? (I don't mention MOZ_X11_EGL because MOZ_X11_EGL is not working with NVIDIA cards).

Flags: needinfo?(ricardopieper)

Also please attach content of about:support page, Thanks.

Attached file aboutsupport.json
Flags: needinfo?(ricardopieper)

(^ I accidentally posted the json above before posting this text)

Yes, that was correct. GTK_CSD=1 and WebRender enabled. I checked the about:support page before posting those results. It was just showing WebRender, not WebRender (software something)

I would like to post the about:support here just to prove it... but now I'm having a new issue where it doesn't even try to run with WebRender enabled. The MOZ_x11_EGL is unset, there is no value there, but I still get this error:

[GFX1-]: glxtest: libEGL initialize failed
[GFX1-]: glxtest: X error, error_code=2, request_code=151, minor_code=3
[GFX1-]: glxtest: process failed (exited with status 1)
[GFX1-]: Failed GL context creation for WebRender: 0
[GFX1-]: FEATURE_FAILURE_WEBRENDER_INITIALIZE_UNSPECIFIED
[GFX1-]: Failed to connect WebRenderBridgeChild.
[GFX1-]: Fallback (SW-)WR to Basic

I'm a bit lost. I attached the raw about:support json.

This ran in the latest nightly, brand new profile, and I just set the webrender.enabled flag to True on the about:config page.

Also, GTK_CSD=1 or 0 doesn't change the result.

AAllso, it doesn't seem to be a white square per se... the color seems to be determined by the background color of the page. Just to make sure there is no misunderstanding here.

Summary:

  • Affected users: Nightly+Early Beta on Gnome X11 with GTK_CSD=1 env var on proprietary Nvidia (bug 1673752 comment 5 enabled WR on proprietary Nvidia 460.32.03 or newer)
  • Comment 1 to 25 are the same as comment 0:

(Darkspirit from bug 1663273 comment 80)

Proprietary Nvidia, GTX1060, Debian Testing

Basic GLX WR GLX SWWR GLX Basic EGL WR EGL SWWR EGL
Gnome X11 with GTK_CSD=1 fine this bug is still present bug 1674473 fine fallback, see below bug 1674473

WR GLX/Gnome X11 with GTK_CSD=1/proprietary Nvidia: top half with solid color

GTK_CSD=1 mozregression --repo try --launch 230f8c44f18b85947446ab2c9ba98dd17380b716 --pref gfx.webrender.all:true -a about:support

WR EGL/Gnome X11 with GTK_CSD=1/proprietary Nvidia: GL context failure:

Gnome X11 with GTK_CSD=1 and MOZ_X11_EGL=1 on proprietary Nvidia:
Almost the same as in comment 48, but WebRender now falls back to Basic instead of OpenGL (bug 1677825):
GTK_CSD=1 MOZ_X11_EGL=1 mozregression --repo try --launch 230f8c44f18b85947446ab2c9ba98dd17380b716 --pref gfx.webrender.all:true -a about:support

Compositing Basic
(#0) Error Failed to create EGLSurface!: 0x3009
(#1) Error Failed to create EGLSurface!: 0x3009
(#2) Error Failed GL context creation for WebRender: 0
(#3) Error FEATURE_FAILTURE_WEBRENDER_INITIALIZE_UNSPECIFIED
(#4) Error Failed to connect WebRenderBridgeChild.

Just wanted to report that I also experience this issue

  • for the last few months when WebRender was not yet enabled by default, I tried enabling it after FF updates to check if it is working (it did not).
  • since yesterday when FF got updated to 89.0 on elementary OS

setting GTK_CSD=0 as a workaround also works in my case.

specs:
FF 89.0 (and versions before) - troubleshooting info: https://zerobin.net/?d1c2ef50d5602369#CzK+voiAKNcpvzztScxSZq8ijvRrH+XsYdh3ZwcLuYM=
elementary OS 5.1.7 (Built on Ubuntu 18.04.4 LTS)
Linux 5.4.0-74-generic
GTK 3.22.30
Nvidia 2070 Super @ proprietary 460.80

[Tracking Requested - why for this release]:
It seems this was shipped to release.
Elementary OS seems to have GTK_CSD=1 environment variable by default. Proprietary Nvidia users are affected.

Has Regression Range: --- → yes

Hm, this somehow fell through the cracks and went into release a bit too soon I guess (https://searchfox.org/mozilla-central/source/widget/gtk/GfxInfo.cpp#743-749).

Andrew, I think we need to limit WR NV prop. driver rollout to DEs where this doesn't happen.

Martin, do we really need this different CSD types? Can we do anything to make the DEs support the CSD types that are not affected by this bug?

Flags: needinfo?(stransky)
Flags: needinfo?(aosmond)

(In reply to Robert Mader [:rmader] from comment #32)

Martin, do we really need this different CSD types? Can we do anything to make the DEs support the CSD types that are not affected by this bug?

I don't think it's related to CSD, because we use CSD on Elementary OS by default, no matter if GTK_CSD is set or not, see:

https://searchfox.org/mozilla-central/rev/e8904db16ac45bff0ffe65e7289f8d2f00c48c48/widget/gtk/nsWindow.cpp#8700
https://searchfox.org/mozilla-central/rev/e8904db16ac45bff0ffe65e7289f8d2f00c48c48/widget/gtk/nsWindow.cpp#8738

It may be related to disabled titlebar so maybe window configuration / GL window may be wrong or so.

Andy, can you try to enable system titlebar (go to Hamburger menu -> Customize Toolbar -> 'Title bar' check box at left bottom corner) and try again?

Flags: needinfo?(stransky) → needinfo?(tsa.andy)

Hey Martin,

when not setting GTK_CSD=0 manually and enabling the title bar as you suggested, the issue still exists.

Flags: needinfo?(tsa.andy)

Hi there, I originally reported 1714355 which was marked as duplicate. I upgraded my drivers again (nVidia prop. 465.27) to check some of the suggestions and it turned out, that the problem does not persist in Firefox 90 DE. I thought that might be a helpful information and made a screenshot comparing 90 DE with 89 (as a Flatpak).

(In reply to nr from comment #38)

problem does not persist in Firefox 90 DE.

Please attach your Firefox 90 DE about:support information.

Attached file about.support.json

This is very interesting - maybe there's a bug in the NV prop. driver concerning depth buffers - and in FF90 we stopped using them, see bug 1711490

Andy, Ricardo, can you confirm that the issue is resolved in beta/nightly? That would be great news!

Flags: needinfo?(tsa.andy)
Flags: needinfo?(ricardopieper)

I have the same issue (#1714355) and I can confirm it works in v90b (flatpak: flathub-beta)

Right now for v89, as we all know, the workaround is to disable hardware acceleration

Could someone try to find out which commit fixed it? At the end, you get a pushlog URL:

$ pip3 install --user mozregression
$ mozregression --find-fix --bad 2021-03-22 --good 2021-05-31 --pref gfx.webrender.all:true

Hi Robert,

I can confirm that the issue does not exist on 90.0b5 (64-Bit)

Flags: needinfo?(tsa.andy)
 6:51.57 INFO: No more integration revisions, bisection finished.
 6:51.57 INFO: First good revision: de6dfc676a6877428343a1c0fdbb099fc6b3ebfd
 6:51.57 INFO: Last bad revision: 6bce4e61777b33736a0bde8ec3bb88a26a8f430d
 6:51.57 INFO: Pushlog:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=6bce4e61777b33736a0bde8ec3bb88a26a8f430d&tochange=de6dfc676a6877428343a1c0fdbb099fc6b3ebfd

https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=6bce4e61777b33736a0bde8ec3bb88a26a8f430d&tochange=de6dfc676a6877428343a1c0fdbb099fc6b3ebfd

Thanks for confirming, so this was indeed an issue with the depth buffer (bug 1711490 as mentioned in comment 41 just removed it, but we stopped using it in bug 1696905).

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → DUPLICATE

Arthur, this is apparently another bug in the nvidia driver. It is now fixed in FF90 by using cpu side culling instead of using a depth buffer, but you may want to have a look never the less, in case other applications also run into it.

Flags: needinfo?(ahuillet)

I have hit this issue myself and was curious to understand the root cause.
What is the bug thought to be exactly? I haven't seen a description of what was wrong with the depth buffer.
Do you maybe have a standalone reproducer or an Apitrace to inspect the GL command stream?
Thanks

Flags: needinfo?(ahuillet) → needinfo?(robert.mader)

I'm very sorry, I currently don't have a device to reproduce or capture a trace right now. All I can tell you is that the depth buffer was used to order overlapping tiles during compositing the final image (IIUC), and that it worked correctly on Mesa drivers and other OSs like Android. As the bug apparently only appears in combination with a title bar, chances are that there's odd happening in the GTK backend that wouldn't happen on other platforms.

Flags: needinfo?(robert.mader)

(I'm not a developer.)
This bug also occured with "WebRender/GLX/KDE with disabled compositing/proprietary Nvidia" when non-alpha visual and XShape were used (bug 1663273 comment 45). bug 1663273 comment 82 fixed it for KDE back then.

Then I filed this bug for the unfixed "WebRender/GLX/Gnome X11 with GTK_CSD=1 environment variable/proprietary Nvidia" case.

I tested proprietary Nvidia with WebRender/GLX/Gnome X11 with GTK_CSD on/off on Fedora 34 but I can't reproduce it.
It may be related to Elementary OS.

Should Firefox really be left broken for Nvidia/Elementary OS users until the next release? This bug has been closed as duplicate and only 89 is affected. Fixed Firefox 90 will be released in two weeks.

Flags: needinfo?(ricardopieper)
Flags: needinfo?(aosmond)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: